Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> Furthermore, the evil.c file with the same SHA1 hash would need to be valid C code that does something evil while still yielding the same hash

...and also produce an innocent-looking diff!

I mean, you could stuff a bunch of random bytes into a C comment to force the desired hash in the output using these documented attack techniques, but anyone inspecting the diffs between versions is likely to see such an explosion of noise and call foul.

If you want an analogy, it's like someone saying they've learned to impersonate federal agent identification cards, only it requires that the person carrying the fake ID to have a thousand rainbow-dyed ducks on a leash in tow behind him.

Such attacks are fine when it's dumb software systems doing the checks, but for a source code repository where people do in fact visually check the diffs occasionally?

Well, let's just say that when someone manages to use SHAttered and/or SHAmbles type attacks on Git (or even Fossil) I expect that it won't take a genius detective to see that the repo's been attacked.



Many diff tools don't highlight whitespace-only changes. Or at least not in a clear manner.

Also, if something is replaced in the history how often do people go back and view diffs in old code? Hardly often enough to rely on it being spotted.


It only takes one person to raise the flag.

Sure, many thousands of people doing blind "git clone && configure && sudo make install" could be burned by a problem like this, but someone would eventually do a diff and see the problem on any project big enough to have those thousands of trusting users in the first place.

I'm not excusing these SHA-1 weaknesses, only pointing out that it won't be trivial to apply them to program source code repos no matter how cheap the attacks get.

For instance, the demonstration case for SHAttered was a pair of PDFs: humans can't reasonably inspect those to find whatever noise had to be stuffed into them to achieve the result.

I also understand that these SHA-1 weaknesses have been used to attack X.509 certificates, but there again you have a case very unlike a software code repo, where the one doing the checking isn't another programmer but a program.


The problem is that we are considering an issue where different people can get different objects for the same hash. If the people checking all see the valid files, they cannot raise any alarms to save the poor victims who got poisoned with the wrong objects. They'll clone from the wrong fork, and no amount of checking hashes or signed tags will prevent them from running compromised code.


> If the people checking all see the valid files

...which will likely contain thousands of bytes of pseudorandom data in order to force the hash collision...

> they cannot raise any alarms

You think a human won't be able to notice that the diff from the last version they tested looks awfully funny? Code that can fool the compiler into producing an evil binary is one thing, but code that can pass a human code review is quite another.

You might be surprised how often that occurs.

I don't do a diff before each third-party DVCS repo pull, but I do diff the code when integrating such third-party code into my projects, if only so I understand what they've done since the last time I updated. Commit messages, ChangeLogs, and release announcements only get you so far.

Back when I was producing binary packages for a popular software distribution, I'd often be forced to diff the code when producing new binaries, since several of the popular binary package distribution systems are based on patches atop pristine upstream source packages. (RPM, DEB, Cygwin packages...)

Each time a binary package creator updates, there's a good chance they've had to diff the versions to work out how to apply their old distro-specific patches atop the new codebase.

Someone's going to notice the first time this happens, and my guess is that it'll happen rather quickly.


If this is your threat model, you don't need hashes or signed tags at all. Good for you. Thankfully both Fossil and Git disagree with you and take the threat seriously :)


That's an argument for why you shouldn't worry about sha1 attacks in source control, but we should take the attack for granted when discussing how to mitigate the attack.

If we weren't worried about sha1 collisions in git then we wouldn't switch to a new hash function.


When is the right time to worry? Maybe wait until someone publishes a practical attack, then wait years for the new code to get sufficiently far out into the world that you can switch to it?

I mean, I see you're expressing concern, but the first major red flag on this went up three years ago, and another big one went up last month. (https://sha-mbles.github.io/)

When we dealt with this same problem over in Fossil land, we ended up needing to wait most of three years for Debian to finally ship a new enough binary that we could switch the default to SHA-3. Fortunately (?) RHEL doesn't ship Fossil, else we'd likely have had to wait even longer.

Atop that same problem, Git's also got tremendously more inertia. Git has to wait out not only the Debian and RHEL stable package policies but also all of that infrastructure tooling they brag on. Every random programmer's editor, merge tool, Git front end... all of that which a project depends on will have to convert over before that one project can move to a post-SHA-1 future.

This is going to be a colossal mess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: