Top
Best
New

Posted by plesiv 10/25/2024

Infinite Git repos on Cloudflare workers(gitlip.com)
144 points | 90 commentspage 2
tredre3 10/25/2024|
> Wanting to avoid managing the servers ourselves, we experimented with a serverless approach.

I must be getting old but building a gigantic house of card of interlinked components only to arrive to a more limited solution is truly bizarre to me.

The maintenance burden for a VPS: periodically run apt update upgrade. Use filesystem snapshots to create periodic backups. If something happens to your provider, spin up a new VM elsewhere with your last snapshot.

The maintenance burden for your solution: Periodically merge upstream libgit2 in your custom fork, maintain your custom git server code and audit it for vulnerabilities, make sure everything still compiles with emscripten, deploy it. Rotate API keys to make sure your database service can talk to your storage service and your worker service. Then I don't even know how you'd backup all this to get it back online quickly if something happened to cloudflare. And all that only to end up with worse latency than a VPS, and more size constraints on the repo and objects.

But hey, at least it scales infinitely!

notamy 10/25/2024||
> The maintenance burden for a VPS: periodically run apt update upgrade. Use filesystem snapshots to create periodic backups. If something happens to your provider, spin up a new VM elsewhere with your last snapshot.

And make sure it reboots for kernel upgrades (or set up live-patching), and make sure that service updates don't go wrong[0], and make sure that your backups work consistently, and make sure that you're able to vertically or horizontally scale, and make sure it's all automated and repeatable, and make sure the automation is following best-practices, and make sure you're not accidentally configuring any services to be vulnerable[1], and ...

Making this stuff be someone else's problem by using managed services is a lot easier, especially with a smaller team, because then you can focus on what you're building and not making sure your SPOF VPS is still running correctly.

[0] I self-host some stuff for a side-project right now, and packages updates are miserable because they're not simply `apt-get update && apt-get upgrade`. Instead, the documented upgrade process for some services is more/less "dump the entire DB, stop the service, rm -rf the old DB, upgrade the service package, start the service, load the dump in, hope it works."

[1] Because it's so easy to configure something to be vulnerable because it makes it easier, even if the vulnerability was unintentional.

kentonv 10/25/2024||
> Periodically merge upstream libgit2 in your custom fork, maintain your custom git server code and audit it for vulnerabilities, make sure everything still compiles with emscripten, deploy it.

There's only a difference here because there exist off-the-shelf git packages for traditional VPS environments but there do not yet exist off-the-shelf git packages for serverless stacks. The OP is a pioneer here. The work they are doing is what will eventually make this an off-the-shelf thing for everyone else.

> Rotate API keys to make sure your database service can talk to your storage service and your worker service.

Huh? With Durable Objects the storage is local to each object. There is no API key involved in accessing it.

> Then I don't even know how you'd backup all this

Durable Object storage (under the new beta storage engine) automatically gives you point-in-time recovery to any point in time in the last 30 days.

https://developers.cloudflare.com/durable-objects/api/storag...

> And all that only to end up with worse latency than a VPS

Why would it be worse? It should be better, because Cloudflare can locate each DO (git repo) close to whoever is accessing it, whereas your VPS is going to sit in one single central location that's probably further away.

> and more size constraints on the repo and objects.

While each individual repo may be more constrained, this solution can scale to far more total repos than a single-server VPS could.

(I'm the tech lead for Cloudflare Workers.)

Spunkie 10/25/2024||
I've been wondering what to do to backup our github repos other than keeping a local copy and/or dumping them on something like S3.

I would love to use this to serve as a live/working automatic backup for my github repos on CF infrastructure.

yellow_lead 10/25/2024||
The latency on the examples seems quite slow, around 7 seconds to a full load for me.

https://gitlip.com/@nataliemarleny/test-repo

plesiv 10/25/2024|
OP here. That’s expected for now, and we’re working on a solution. We didn’t explain the reason in the post because we plan to cover it in a separate write-up.
yellow_lead 10/25/2024||
I see you haven't launched yet so that's fair. Looking forward to trying it
ericyd 10/25/2024||
Engaging read! For me, just the right balance of technical detail and narrative content. It's a hard balance to strike and I'm sure preferences vary widely which makes it an impossible target for every audience.
csomar 10/26/2024||
This picked my interest as I am working on a Git product and using Cloudflare Workers for most of my back-end. I looked through the options, but the hard limit for Cloudflare workers and the fact that most interesting repos (that is companies you want to sell to) have repos in the Gbs means the platform is not fit for this.

I am ending up with AWS lambdas. Not only that solves the Wasm issue but you can have up to 10Gb of memory on a single instance. That is close to enough for most use cases. 100Mb? Not really.

gkoberger 10/25/2024||
This is really cool! I've been building something on libgit2 + EFS, and this approach is really interesting.

Between libgit2 on emscripten, the number of file writes to DO, etc, how is performance?

markphip 10/25/2024||
I wonder if they considered or looked at using JGit? https://github.com/eclipse-jgit/jgit

It provides client and server API. The latter is used by Gerrit for its server. https://www.gerritcodereview.com

Not sure what the Java to WASM story is if that is a requirement for what they need.

stavros 10/25/2024||
This is a very impressive technical achievement, and it's clear that a lot of work went into it.

Unfortunately, the entrepreneur in me continues that thought with "work that could have gone into finding customers instead". Now you have a system that could store "infinite" git repos, but how many customers?

deadbunny 10/25/2024||
TFA is literally marketing for them. HN is their target audience and a good way to capture that audience is to show them something technically interesting.
akerl_ 10/25/2024||
I agree. That’s a really unfortunate way to view somebody’s project.
stavros 10/25/2024||
Is it a project, or is it a company?
akerl_ 10/25/2024||
This feels like you're trying to ask rhetorically, but... Cloudflare hosted a program for startups to build things on Cloudflare Workers. This is a project that somebody built as part of that program, which they're now launching as a product.
stavros 10/25/2024||
Ah, that explains it, thank you.
scosman 10/25/2024||
Serverless git repos: super cool

But I can't figure out what makes this an AI company. Seems like a collaboration tool?

plesiv 10/25/2024|
OP here. We're not an AI company; we're aiming to be AI-adjacent and simplify the practical application of AI models.
ijamj 10/25/2024||
Honest question: how is this "AI-adjacent"? How does it specifically "simplify the practical application of AI models"? Focus of the question being on "AI"...
iampims 10/25/2024|
Some serious engineering here. Kudos!
More comments...