The Future of Version Control

Posted by c17r 7 hours ago

The Future of Version Control(bramcohen.com)

270 points | 149 commentspage 3

a-dub 4 hours ago|

doesn't the side by side view in github diff solve this?

conflict free merging sounds cool, but doesn't that just mean that that a human review step is replaced by "changes become intervals rather than collections of lines" and "last set of intervals always wins"? seems like it makes sense when the conflicts are resolved instantaneously during live editing but does it still make sense with one shot code merges over long intervals of time? today's systems are "get the patch right" and then "get the merge right"... can automatic intervalization be trusted?

edit: actually really interesting if you think about it. crdts have been proven with character at a time edits and use of the mouse select tool.... these are inherently intervalized (select) or easy (character at a time). how does it work for larger patches can have loads of small edits?

ballsweat 2 hours ago||

Cool timing.

I recently built Artifact: https://www.paganartifact.com/benny/artifact

Mirror: https://github.com/bennyschmidt/artifact

In case anyone was curious what a full rewrite of git would look like in Node!

The main difference is that on the server I only store deltas, not files, and the repo is “built”.

But yeah full alternative to git with familiar commands, and a hub to go with it.

mentalgear 4 hours ago||

> [CRDT] This means merges don’t need to find a common ancestor or traverse the DAG. Two states go in, one state comes out, and it’s always correct.

Well, isn't that what the CRDT does in its own data structure ?

Also keep in mind that syntactic correctness doesn't mean functional correctness.

mweidner 29 minutes ago||

You can think of the semantics (i.e., specification) of any CRDT as a function that inputs the operation history DAG and outputs the resulting user-facing state. However, algorithms and implementations usually have a more programmatic description, like "here is a function `(internal state, new operation) -> new internal state`", both for efficiency (update speed; storing less info than the full history) and because DAGs are hard to reason about. But you do see the function-of-history approach in the paper "Pure Operation-Based Replicated Data Types" [1].

[1] https://arxiv.org/abs/1710.04469

Retr0id 3 hours ago||

Yes.

There are many ways to instantiate a CRDT, and a trivial one would be "last write wins" over the whole source tree state. LWW is obviously not what you'd want for source version control. It is "correct" per its own definition, but it is not useful.

Anyone saying "CRDTs solve this" without elaborating on the specifics of their CRDT is not saying very much at all.

braidedpubes 2 hours ago||

Do I have it right that it’s basically timestamp based, except not based on our clocks but one it manages itself?

So as long as all updates have been sent to the server from all clients, it will know what “time” each character changed and be able to merge automatically.

Is that it basically?

lifeformed 5 hours ago||

My issue with git is handling non-text files, which is a common issue with game development. git-lfs is okay but it has some tricky quirks, and you end up with lots of bloat, and you can't merge. I don't really have an answer to how to improve it, but it would be nice if there was some innovation in that area too.

samuelstros 3 hours ago||

Improving on "git not handling non-text files" is a semantic understanding aka parse step in between the file write.

Take a docx, write the file, parse it into entities e.g. paragraph, table, etc. and track changes on those entities instead of the binary blob. You can apply the same logic to files used in game development.

The hard part is making this fast enough. But I am working on this with lix [0].

[0] https://github.com/opral/lix

gregschoeninger 3 hours ago|||

We're working on this project to help with the non-text file and large file problem: https://github.com/Oxen-AI/Oxen

Started with the machine learning use case for datasets and model weights but seeing a lot of traction in gaming as well.

Always open for feedback and ideas to improve if you want to take it for a spin!

jayd16 4 hours ago|||

Totally agree. After trying to flesh out Unreal's git plugin, it really shows how far from ideal git really is.

Partial checkouts are awkward at best, LFS locks are somehow still buggy and the CLI doesn't support batched updates. Checking the status of a remote branch vs your local (to prevent conflicts) is at best a naive polling.

Better rebase would be a nice to have but there's still so much left to improve for trunk based dev.

rectang 4 hours ago|||

Has there ever been a consideration for the git file format to allow storage of binary blobs uncompressed?

When I was screwing around with the Git file format, tricks I would use to save space like hard-linking or memory-mapping couldn't work, because data is always stored compressed after a header.

A general copy-on-write approach to save checkout space is presumably impossible, but I wonder what other people have traveled down similar paths have concluded.

zahlman 4 hours ago|||

What strategies would you like to use to diff the binaries? Or else how are you going to avoid bloat?

Is it actually okay to try to merge changes to binaries? If two people modify, say, different regions of an image file (even in PNG or another lossless compression format), the sum of the visual changes isn't necessarily equal to the sum of the byte-level changes.

miloignis 5 hours ago||

I really think something like Xet is a better idea to augment Git than LFS, though it seems to pretty much only be used by HuggingFace for ML model storage, and I think their git plugin was deprecated? Too bad if it ends up only serving the HuggingFace niche.

echrisinger 2 hours ago||

Has anyone considered a VCS that integrates more vertically with the source code through ASTs?

IE if I change something in my data model, that change & context could be surfaced with agentic tooling.

phtrivier 4 hours ago||

A suggestion : is there any info to provide in diffs that is faster to parse than "left" and "right" ? Can the system have enough data to print "bob@foo.bar changed this" ?

lasgawe 4 hours ago||

This is a really interesting and well thought out idea, especially the way it turns conflicts into something informative instead of blocking. The improved conflict display alone makes it much easier to understand what actually happened. I think using CRDTs to guarantee merges always succeed while still keeping useful history feels like a strong direction for version control. Looks like a solid concept!

alunchbox 4 hours ago|

Jujutsu honestly is the future IMO, it already does what you have outlined but solved in a different way with merges, it'll let you merge but outline you have conflicts that need to be resolved for instance.

It's been amazing watching it grow over the last few years.

aduwah 3 hours ago|

The only reason I have not defaulted to jj already is the inability to be messy with it. Easy to make mistakes without "git add"

llyama 2 hours ago|||

You can be messy. The lack of an explicit staging area doesn't restrict that. `jj commit` gives the same mental model for "I want to commit 2 files from the 5 I've changed".

dzaima 2 hours ago|||

But you do have the op log, giving you a full copy of the log (incl. the contents of the workspace) at every operation, so you can get out of such mistakes with some finagling.

You can choose to have a workflow where you're never directly editing any commit to "gain back autonomy" of the working copy; and if you really want to, with some scripting, you can even emulate a staging area with a specially-formatted commit below the working copy commit.

More comments...