[1] https://stackoverflow.com/questions/55998614/merge-made-by-r...
https://www.mercurial-scm.org/pipermail/mercurial/2012-Janua...
> Jujutsu keeps track of conflicts as first-class objects in its model; they are first-class in the same way commits are, while alternatives like Git simply think of conflicts as textual diffs. While not as rigorous as systems like Darcs (which is based on a formalized theory of patches, as opposed to snapshots), the effect is that many forms of conflict resolution can be performed and propagated automatically.
Take out the last "/timeline" component of the URL to clone via Fossil: https://chiselapp.com/user/chungy/repository/test/timeline
See also, the upstream documentation on branches and merging: https://fossil-scm.org/home/doc/trunk/www/branching.wiki
I always forget all the flags and I work with literally just: clone, branch, checkout, push.
(Each feature is a fresh branch tho)
Ends up being circular if the author used LLM help for this writeup though there are no obvious signs of that.
Maybe that's obvious to most people, but it was a bit surprising to see it myself. It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.
The article doesn't contain any LLM output. I use LLMs to ask for advice on coding conventions (especially in rust, since I'm bad at it), and sometimes as part of research (zstd was suggested by chatgpt along with comparisons to similar algorithms).
(block_ai) {
@ai_bots {
header_regexp User-Agent (?i)(anthropic-ai|ClaudeBot|Claude-Web|Claude-SearchBot|GPTBot|ChatGPT-User|Google-Extended|CCBot|PerplexityBot|ImagesiftBot)
}
abort @ai_bots
}
Then, in a specific app block include it via import block_aiblocking openai ips did wonders for the ambient noise levels in my apartment. they're not the only ones obviously, but they're they only ones i had to block to stay sane
A kind of "they found this code, therefore you have a duty not to poison their model as they take it." Meanwhile if I scrape a website and discover data I'm not supposed to see (e.g. bank details being publicly visible) then I will go to jail for pointing it out. :(
Living in a country with hundreds of millions of other civilians or a city with tens of thousands means compromising what you're allowed to do when it affects other people.
There's a reason we have attractive nuisance laws and you aren't allowed to put a slide on your yard that electrocutes anyone who touches it.
None of this, of course, applies to "poisoning" llms, that's whatever. But all your examples involved actual humans being attacked, not some database.
> It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.
That's very much expected. That's why the quality of LLM coding agents is like it is. (No offense.)
The "asking LLMs for advice" part is where the circular aspect starts to come into the picture. Not worse than looking at StackOverflow though which then links to other people who in turn turned to StackOverflow for advice.
For most throughout history, whatever is presented to you that you believe is the right answer. AI just brings them source information faster so what you're seeing is mostly just the usual behavior, but faster. Before AI people would not have bothered to try and figure out an answer to some of these questions. It would've been too much work.
One of the funniest things I've started to notice from Gemini in particular is that in random situations, it talks with english with an agreeable affect that I can only describe as.. Indian? I've never noticed such a thing leak through before. There must be a ton of people in India who are generating new datasets for training.
I wish I could find it again, if someone else knows the link please post it!
I do know that LLMs generate content heavy with those constructs, but they didn't create the ideas out of thin air, it was in the training set, and existed strongly enough that LLMs saw it as common place/best practice.
Great argument for not using AI-assisted tools to write blog posts (especially if you DO use these tools). I wonder how much we're taking for granted in these early phases before it starts to eat itself.
For others, I highly recommend Git from the Bottom Up[1]. It is a very well-written piece on internal data structures and does a great job of demystifying the opaque git commands that most beginners blindly follow. Best thing you'll learn in 20ish minutes.
[0]: https://tom.preston-werner.com/2009/05/19/the-git-parable
Content-based chunking like Xethub uses really should become the default. It’s not like it’s new either, rsync is based on it.
Notable differences: E2E encryption, parallel imports (Got will light up all your cores), and a data structure that supports large files and directories.
Yeah, totally agree. Got has not solved conflict resolution for arbitrary files. However, we can tell the user where the files differ, and that the file has changed.
There is still value in being able to import files and directories of arbitrary sizes, and having the data encrypted. This is the necessary infrastructure to be able to do distributed version control on large amounts of private data. You can't do that easily with Git. It's very clunky even with remote helpers and LFS.
I talk about that in the Why Got? section of the docs.
I think theoratically, Git delta-compression is still a lot more optimized for smaller repos. But for bigger repos where sharding storaged is required, path-based delta dictionary compression does much better. Git recently (in the last 1 year) got something called "path-walk" which is fairly similar though.
Bookmarked for later
It's only in the context of recreating Git that this comment makes sense.
I had a go at it as well a while back, I call it "shit" https://github.com/emanueldonalds/shit