Package managers keep using Git as a database, it never works out

Posted by birdculture 12/26/2025

Package managers keep using Git as a database, it never works out(nesbitt.io)

784 points | 465 commentspage 3

bencornia 12/26/2025|

> Grab’s engineering team went from 18 minutes for go get to 12 seconds after deploying a module proxy. That’s not a typo. Eighteen minutes down to twelve seconds.

> The problem was that go get needed to fetch each dependency’s source code just to read its go.mod file and resolve transitive dependencies. Cloning entire repositories to get a single file.

I have also had inconsistent performance with go get. Never enough to look closely at it. I wonder if I was running into the same issue?

zahlman 12/26/2025||

> needed to fetch each dependency’s source code just to read its go.mod file and resolve transitive dependencies.

Python used to have this problem as well (technically still does, but a large majority of things are available as a wheel and PyPI generally publishes a separate .metadata file for those wheels), but at least it was only a question of downloading and unpacking an archive file, not cloning an entire repo. Sheesh.

Why would Go need to do that, though? Isn't the go.mod file in a specific place relative to the package root in the repo?

klooney 12/26/2025||

Go's lock files arrived at around the same time as the proxy, before then you didn't have transitive dependencies pre baked.

fireflash38 12/26/2025||

How long ago were you having issues? That was changed in go 1.13.

the__alchemist 12/26/2025||

The Cargo example at the top is striking. Whenever I publish a crate, and it blocks me until I write `--allow-dirty`, I am reminded that there is a conflation between Cargo/crates.io and Git that should not exist. I will write `--allow-dirty` because I think these are two separate functionalities that should not be coupled. Crates.io should not know about or care about my project's Git usage or lack thereof.

cesarb 12/27/2025|

> The Cargo example at the top is striking. Whenever I publish a crate, and it blocks me until I write `--allow-dirty`, I am reminded that there is a conflation between Cargo/crates.io and Git that should not exist. I will write `--allow-dirty` because I think these are two separate functionalities that should not be coupled.

That's completely unrelated.

The --allow-dirty flag is to bypass a local safety check which prevents you from accidentally publishing a crate with changes which haven't been committed to your local git repository. It has no relation at all to the use of git for the index of packages.

> Crates.io should not know about or care about my project's Git usage or lack thereof.

There are good reasons to know or care. The first one, is to provide a link from the crates.io page to your canonical version control repository. The second one, is to add a file containing the original commit identifier (commit hash in case of git) which was used to generate the package, to simplify auditing that the contents of the package match what's on the version control repository (to help defend against supply chain attacks). Both are optional.

the__alchemist 12/27/2025||

Those are great points, and reinforce the concept that there is conflation between Cargo and Git/commits. Commits and Cargo IMO should be separate concepts. Cargo should not be checking my Git history prior to publishing.

frumplestlatz 12/26/2025||

Since ~2002, Macports has used svn or git, but users, by default, rsync the complete port definitions + a server-generated index + a signature.

The index is used for all lookups; it can also be generated or incrementally updated client-side to accommodate local changes.

This has worked fine for literally decades, starting back when bandwidth and CPU power was far more limited.

The problem isn’t using SCM, and the solutions have been known for a very long time.

ekjhgkejhgk 12/26/2025||

Uncertain if this is OT, but given that the CCC is politically inspired organization, I hope not:

One thing that still seems absent is awareness of the complete takeover of "gadgets" in schools. Schools these days, as early as primary school, shove screens in front of children. They're expected to look at them, and "use" them for various activities, including practicing handwriting. I wish I was joking [1].

I see two problems with this.

First is that these devices are engineered to be addictive by way of constant notifications/distractions, and learning is something that requires long sustained focus. There's a lot of data showing that under certain common circumstances, you do worse learning from a screen than from paper.

Second is implicitly it trains children to expect that anything has to be done through a screen connected to a closed point-and-click platform. (Uninformed) people will say "people who work with computers make money, so I want my child to have an ipad". But interacting with a closed platform like an ipad is removing the possibilities and putting the interaction "on rails". You don't learn to think, explore and learn from mistakes, instead you learn to use the app that's put in front of you. This in turn reinforces the "computer says no" [2] approach to understanding the world.

I think this is a matter of civil rights and freedom, but sadly I don't often see "civil rights" organizations talk about this. I think I heard Stallman say something along these lines once, but other than that I don't see campaigns anywhere.

[1] https://www.letterjoin.co.uk/

[2] https://youtu.be/eE9vO-DTNZc

AceJohnny2 12/26/2025|

It looks like you commented on the wrong post, although I don't immediately see a front-page post about the ongoing Chaos Computer Congress.

kzrdude 12/26/2025|||

it's here https://news.ycombinator.com/item?id=46386211 (and it was last on the front page at the moment)

ekjhgkejhgk 12/26/2025||

ekjhgkejhgk 12/26/2025|||

LOL sorry. You're right. I'll copy paste over there.

teiferer 12/26/2025||

And this my friends is the reason why (only) focusing on CPU cycles and memory hierarchies is insufficient when thinking of the performance of a system. Yes they are important. But no level of low-level optimization will get you out of the hole that a wrong choice of algorithm and/or data structure may have dug you into.

leoh 12/26/2025||

The conclusion reached in this essay is 100% wrong. See " The reftable backend What it is, where it's headed, and why should you care?"

>With release 2.45, Git has gained support for the “reftable” backend to read and write references in a Git repository. While this was a significant milestone for Git, it wasn‘t the end of GitLab’s journey to improve scalability in repositories with many references. In this talk you will learn what the reftable backend is, what work we did to improve it even further and why you should care.

https://www.youtube.com/watch?v=0UkonBcLeAo

Also see Scalar, which Microsoft used to scale their 300GiB Windows repository, https://github.com/microsoft/scalar.

mukundesh 12/26/2025||

Though not Github, worth mentioning Huggingface, which is also using git, but managing large files with their(?) xet protocol. https://huggingface.co/docs/hub/en/xet/index

bandrami 12/26/2025||

Maybe I'm misreading the article but isn't every example about the downside of using github as a database host, not the downside of using git as a database?

Like, yes, you should host your own database. This doesn't seem like an argument against that database being git.

zzo38computer 12/26/2025|

Git commits will have a hash and each file will have a hash, which means that locking is unnecessary for read access. (This is also true of fossil, although fossil does have locking since it uses SQLite.)

The other stuff mentioned in the article seems to be valid criticisms.

More comments...