I don't know why you'd trust a checksum structure your adversary has complete control over.

That Merkle tree prevents the naive case where the adversary tries to serve a version of a repo, to a client who already has an older version, differing in a part the client already has. (The part the client has local checksums for). They shouldn't do that. The git client tells the server what commits it doesn't have, so this is simple to check.

Code signing could be a safeguard if people did it, but here they don't so it's moot. I found no mention of a signing key in this repo's docs.

The checksum tree could be a useful audit if there were a transparency log somewhere that git tools automatically checked against, but there isn't so it's moot. We put full trust in Microsoft's versions.

Lots of things could be helpful, but here and now in front of us is a source tree fully in Microsoft's control, with no visible safeguards against Microsoft doing something evil to it. Just like countless others. It's the default state of trust today.

▲

bbarnett 4 days ago | parent | next [-]

Lots of things could be helpful, but here and now in front of us is a source tree fully in Microsoft's control, with no visible safeguards against Microsoft doing something evil to it. Just like countless others

But it's written in rust.

▲

Aloisius 4 days ago | parent | prev | next [-]

> The git client tells the server what commits it doesn't have, so this is simple to check.

That won't work. The first thing the client does is ask the server for list of references with their oids (ls-refs). It only asks for oids and reports what oids it has after the server responds.

You'd need another way to identify that the client asking for references was the same one you vended the tampered source tree to, otherwise, you'd need to respond with the refs' real oids and the fetch would fail since there's no way to get from the oid the user has to the real one.

	▲	cyberpunk 4 days ago \| parent [-]
		Or use signed commits?

▲

marginalia_nu 4 days ago | parent | prev | next [-]

Because the developers have just that on their local machine...?

Git is a distributed vcs after all. Every checkout is its own complete git "hub".

▲

perihelions 4 days ago | parent [-]

Because GitHub can serve different bytes to different people. You log in as one of the project's devs, you get your own consistent, correct view of your project; some other people get malware instead. How do you reconcile the full picture? No one distrusts GitHub. There's no public log which git tools generically check against to see if GitHub is attempting something evil, the way they do with certificate transparency. GitHub is the public log.

Git may be designed as a distributed VCS; and it'd be a different situation if it were used that way in practice. For many projects, GitHub has a full MITM. They could even—forget about the checksums—bifurcate the views in between devs—accept commits from one dev, send over those commits with translated Merkle trees to another dev who has a corrupted history, and they'd never figure it out.

▲

BobaFloutist 4 days ago | parent | next [-]

What happens when a dev tries to patch a bug in the malware and nobody can tell what the hell they're talking about?

▲

saagarjha 4 days ago | parent | prev [-]

Yes, but the moment you try to push your local git will complain that you are not aligned with the upstream repo.

▲

perihelions 4 days ago | parent [-]

Not so. GitHub would remember who you are; advertise to you and to you only a set of fake checksums consistent with your fake view of the repo. Your git client would see nothing amiss—your local fake checksums are consistent with the fake checksums the server sent you. Having accepted your push, the server would ignore the fake checksums, extract the content of your patch, apply it to the genuine repo, and compute a new set of checksums, extending the other checksum tree as if you had pushed directly to it. That's what an MITM is.

▲

saagarjha 4 days ago | parent [-]

This falls apart instantly if you share a hash with anyone else, though. Which is exactly what happens when you send in a PR

	▲	account42 4 days ago \| parent [-]
		Most projects on GitHub have you submit PR's via GitHub infrastructure so they have total control over who sees what there as well.

▲

rstuart4133 4 days ago | parent | prev [-]

> I don't know why you'd trust a checksum structure your adversary has complete control over.

I think the point is they don't have complete control over it. Sure, they have complete control over the version that is on GitHub. But git is distributed, and the developers will have their own local copies. If Microsoft screwed with the checksums, and git checks them. The next developer pull or push would blow up.

▲

perihelions 4 days ago | parent [-]

> "The next developer pull or push would blow up."

If they're pushing or pulling to/from GitHub, then GitHub has a total MITM and is able to dynamically translate checksum trees in between devs' incompatible views of the repo.

▲

cycomanic 4 days ago | parent [-]

I don't understand. Can you explain how that would work? I thought the checksums are calculated on the contents, so how can they translate checksum trees that remain valid without changing the content (or vice versa)? This is my naive understanding, so I might be completely wrong, hence I ask.

▲

perihelions 4 days ago | parent [-]

That they'd change the content is the point—offer malware content for select targets, with corresponding malware checksums that are consistent with that malware and its entire history.

Those checksums would seem valid to the victims, as they're a self-consistent history of checksum trees they got directly from GitHub. The devs would be working with different checksum trees. GitHub would maintain both versions, serving different content and different checksums depending on who asks.

	▲	rstuart4133 3 days ago \| parent [-]
		This seems to boil down to them keeping two repositories - presenting one to the logged in dev, and one to the public. That might work for a while if dev isn't active. He would, for example have to not notice there was a new release, with an incremented version number that triggers updates. Even that doesn't work forever. Down stream dev's often look at the changes - for example a Debian maintainer usually runs his eye over the changes. But if the dev is active this is going to be noticed pretty quickly. The branches will diverge, commit messages, feature announcements, bug reports, line numbers not matching up. It would require a skilled operator to keep them loosely in sync, and that's the best they could do. Either way, sooner or later Microsoft's subterfuge would be discovered, and that is the death knell for this scenario. The outrage here and elsewhere would boil over. Open source would leave github en masse, Microsoft's reputation would be destroyed, they would lose top engineers. I don't have a high opinion of Microsoft's technical skills and leadership as they have been consistently demonstrated themselves to be inconsistent and unreliable. But the company too large and too successful to be psychotic. The shareholders, customers, and lawyers would have someones guts for garters if they pulled a stunt like that.