Posted by kwantaz 10/27/2024
- When you have multiple files in the repo which have the same trailing 16 characters in the repo path, git may wrongly calculate deltas, mixing up between those files. In here they had multiple CHANGELOG.md files mixed up.
- So if those files are big and change often, you end up with massive deltas and inflated repo size.
- There's a new git option (in Microsoft git fork for now) and config to use full file path to calculate those deltas, which fixes the issue when pushing, and locally repacking the repo.
```
git repack -adf --path-walk
git config --global pack.usePathWalk true
```
- According to a screenshot, Chromium repacked in this way shrinks from 100GB to 22GB.
- However AFAIU until GitHub enables it by default, GitHub clones from such repos will still be inflated.
Also, thank you for the TLDR!
Fixing an existing repository requires a full repack, and for a repository as big as Chromium it still takes more than half a day (56000 seconds is 15h30), even if that's an improvement over the previous 3 days it's a lot of compute.
From my experience of previous attempts, trying to get Github to run a full repack with harsh settings is extremely difficult (possibly because their infrastructure relies on more loosely packed repositories), I tried to get that for $dayjob's primary repository whose initial checkout had gotten pretty large and got nowhere.
As of right now, said repository is ~9.5GB on disk on initial clone (full, not partial, excluding working copy). Locally running `repack -adf --window 250` brings it down to ~1.5GB, at the cost of a few hours of CPU.
The repository does have some of the attributes described in TFA, so I'm definitely looking forward to trying these changes out.
For now we’re getting by with partial clones, and employee machines being imaged with a decently up to date repository.
Wait, what? Has MS forked git?
These are not forks-going-their-own-way forks.
This surely cannot be correct. Even the title of the linked article doesn't use "shranked". What?