Posted by plesiv 2 days ago
Did they get a waiver from the git team to name it as such?
Per the trademark policy, new “git${SUFFIX}” names aren’t allowed: https://git-scm.com/about/trademark
>> In addition, you may not use any of the Marks as a syllable in a new word or as part of a portmanteau (e.g., "Gitalicious", "Gitpedia") used as a mark for a third-party product or service without Conservancy's written permission. For the avoidance of doubt, this provision applies even to third-party marks that use the Marks as a syllable or as part of a portmanteau to refer to a product or service's use of Git code.
There can't be trademark infringement unless there is a likelihood of confusion.
Where does this fit into a product? Maybe I am blind, but while this is cool, I don't really see where I would want this.
What if these bug reporting platforms could create a branch and tag it for each issue.
This would be particularly useful for point and time things where you have an immutable deployment branch. So it could create a branch off that immutable deployment branch and tag it, so you always have a point in time code reference for bugs.
Would that be useful? I feel like what you’re doing here isn’t that different if I get what’s going on (basically creating one repository per bug?)
There are various repositories with 500k+ commits
So I can easily see why having many branches is more storage than the same number of commits.
I like this system in general, but I don't understand why scaling the number of repos is treated as a pinch point? Are there git hosts that struggle with the number of repos hosted in particular? (I don't think the "Motivation" section answers this, either.)
There are many reasons not to do this, perhaps this scratches away at one of them.
It’s unlikely any Git providers struggle with the number of repos they're hosting, but most are larger companies.
Currently, we're a bootstrapped team of 2. I think our approach changes the kind of product we can build as a small team.
Unless, of course, your product is infinite git repos with cf workers.
You can still sync to a platform like GitHub or BitBucket after all users close their tabs.
A long time ago, I looked into using isomorphic-git with lightning-fs to build light note-taking app in the browser: pull your markdown files in, edit them in a rich-text-editor a la Notion, stage and then commit changes back using git.
> We ended up creating our own Emscripten filesystem on top of Durable Objects, which we call DOFS.
> We abandoned the porting efforts and ended up implementing the missing Git server functionality ourselves by leveraging libgit2’s core functionality, studying all available documentation, and painstakingly investigating Git’s behavior.
Using a ton of great open source & taking it all further. Would sure be great if ya'll could contribute some of this forward!
Libgit2 is GPL with Linking Exception, and Emscripten MIT so I think legally everything is in the clear. But it sure would be such a boon to share.
I believe our changes are solid, but they’re tailored specifically to our use case and can’t be merged as-is. For example, our modifications to libgit2 would need at least as much additional code to make them toggleable in the build process, which requires extra effort.
Very cool project. I hope Cloudflare workers can support more protocols like SSH and GRPC. It's one of the reasons why I prefer Fly.io over Cloudflare worker for special servers like this.
Choose these values:
* P, pack "Planck" size, e.g. 100kB
* N, branching factor, e.g. 8
After each write:
1. iterate over each pack (pack size is S) and assign each pack a class C which is the smallest integer that satisfies P * N^C > S
2. iterate variable c from 0 to the maximum value of C that you got in step 2
* if there are N packs of class c, repack them into a new pack, new pack is going to be at most of class c+1
If only 20% of the content gets changed, the rolling hash that Borg does to chunk files could identify the 80% common parts and then with its deduplication it would store just a single compressed copy of those chunks. And as a bonus, it's designed for handling historical data.
adamc@router> show arp | display xml <rpc-reply xmlns:JUNOS="http://xml.juniper.net/JUNOS/15.1F6/JUNOS"> <arp-table-information xmlns="http://xml.juniper.net/JUNOS/15.1F6/JUNOS-arp" JUNOS:style="normal"> <arp-table-entry> <mac-address>0a:00:27:00:00:00</mac-address> <ip-address>10.0.201.1</ip-address> <hostname>adamc-mac</hostname> <interface-name>em0.0</interface-name> <arp-table-entry-flags> <none/> </arp-table-entry-flags> </arp-table-entry> </arp-table-information> <cli> <banner></banner> </cli> </rpc-reply>
EXCEPT/INTERSECT make this easy for a bunch of columns (excluding the times of course, I usually hash these for performance reasons) but wont tell you what itself is the difference, you have to do column by column comparisons here, which is where I usually shell out to my language of choice because SQL sucks at doing that.
...daily? monthly? how many versions do you have to keep around?
I'd look at a simple zstd dictionary based scheme, first. Put your history/metadata into a database. Put the XML data into file system/S3/BackBlaze/B2, zstd compressed against a dictionary.
Create the dictionary : zstd --train PathToTrainingSet/* -o dictionaryName Compress with the dictionary: zstd FILE -D dictionaryName Decompress with the dictionary: zstd --decompress FILE.zst -D dictionaryName
Although you say you're fine with it being not that storage efficient to a degree, I think if you were OK with storing every version of every XML file, uncompressed, you wouldn't have to ask right?
Git does not store diffs, it stores every version. These get compressed into packfiles https://git-scm.com/book/en/v2/Git-Internals-Packfiles. It looks like it uses zlib.
As a hobbyist, “free” is pretty appealing. I’m pretty sure my repos on GitHub won’t cost me anything, and that’s unlikely to change anytime soon. Not sure about the new stuff.
This could be a fantastic building block for headless CMS and the like.
what if there are two users who wants to access the same DO repo at the same time, one in the US and the other in Singapore? the DO must live either in US servers or SG servers, but not at the same time. so one of the two users must have high latency then?
then after some time, a user in Australia accesses this DO repo - the DO bounces to AU servers - US and SG users will have high latency?
but please correct me if i'm wrong