> Furthermore, the evil.c file with the same SHA1 hash would need to be valid C ...

tjoff · on Feb 4, 2020

Many diff tools don't highlight whitespace-only changes. Or at least not in a clear manner.

Also, if something is replaced in the history how often do people go back and view diffs in old code? Hardly often enough to rely on it being spotted.

wyoung2 · on Feb 4, 2020

It only takes one person to raise the flag.

Sure, many thousands of people doing blind "git clone && configure && sudo make install" could be burned by a problem like this, but someone would eventually do a diff and see the problem on any project big enough to have those thousands of trusting users in the first place.

I'm not excusing these SHA-1 weaknesses, only pointing out that it won't be trivial to apply them to program source code repos no matter how cheap the attacks get.

For instance, the demonstration case for SHAttered was a pair of PDFs: humans can't reasonably inspect those to find whatever noise had to be stuffed into them to achieve the result.

I also understand that these SHA-1 weaknesses have been used to attack X.509 certificates, but there again you have a case very unlike a software code repo, where the one doing the checking isn't another programmer but a program.

remram · on Feb 4, 2020

The problem is that we are considering an issue where different people can get different objects for the same hash. If the people checking all see the valid files, they cannot raise any alarms to save the poor victims who got poisoned with the wrong objects. They'll clone from the wrong fork, and no amount of checking hashes or signed tags will prevent them from running compromised code.

wyoung2 · on Feb 4, 2020

> If the people checking all see the valid files

...which will likely contain thousands of bytes of pseudorandom data in order to force the hash collision...

> they cannot raise any alarms

You think a human won't be able to notice that the diff from the last version they tested looks awfully funny? Code that can fool the compiler into producing an evil binary is one thing, but code that can pass a human code review is quite another.

You might be surprised how often that occurs.

I don't do a diff before each third-party DVCS repo pull, but I do diff the code when integrating such third-party code into my projects, if only so I understand what they've done since the last time I updated. Commit messages, ChangeLogs, and release announcements only get you so far.

Back when I was producing binary packages for a popular software distribution, I'd often be forced to diff the code when producing new binaries, since several of the popular binary package distribution systems are based on patches atop pristine upstream source packages. (RPM, DEB, Cygwin packages...)

Each time a binary package creator updates, there's a good chance they've had to diff the versions to work out how to apply their old distro-specific patches atop the new codebase.

Someone's going to notice the first time this happens, and my guess is that it'll happen rather quickly.

remram · on Feb 4, 2020

If this is your threat model, you don't need hashes or signed tags at all. Good for you. Thankfully both Fossil and Git disagree with you and take the threat seriously :)

seniorsassycat · on Feb 4, 2020

That's an argument for why you shouldn't worry about sha1 attacks in source control, but we should take the attack for granted when discussing how to mitigate the attack.

If we weren't worried about sha1 collisions in git then we wouldn't switch to a new hash function.

wyoung2 · on Feb 4, 2020

When is the right time to worry? Maybe wait until someone publishes a practical attack, then wait years for the new code to get sufficiently far out into the world that you can switch to it?

I mean, I see you're expressing concern, but the first major red flag on this went up three years ago, and another big one went up last month. (https://sha-mbles.github.io/)

When we dealt with this same problem over in Fossil land, we ended up needing to wait most of three years for Debian to finally ship a new enough binary that we could switch the default to SHA-3. Fortunately (?) RHEL doesn't ship Fossil, else we'd likely have had to wait even longer.

Atop that same problem, Git's also got tremendously more inertia. Git has to wait out not only the Debian and RHEL stable package policies but also all of that infrastructure tooling they brag on. Every random programmer's editor, merge tool, Git front end... all of that which a project depends on will have to convert over before that one project can move to a post-SHA-1 future.

This is going to be a colossal mess.