Top
Best
New

Posted by Simpliplant 3 hours ago

GitHub Is Having Issues(www.githubstatus.com)
181 points | 117 comments
terminalbraid 2 hours ago|
I would prefer we have posts when github is not having issues to cut down on noise.
duggan 1 hour ago||
A directory over SSH can be your git server. If your CI isn't too complex, a post-receive hook looping into Docker can be enough. I wrote up about self hosting git and builds a few weeks ago[1].

There are heavier solutions, but even setting something like this up as a backstop might be useful. If your blog is being hammered by ChatGPT traffic, spare a thought for Github. I can only imagine their traffic has ballooned phenomenally.

1: https://duggan.ie/posts/self-hosting-git-and-builds-without-...

arianvanp 49 minutes ago|
Doesn't post-receive block the push operation and get cancelled when you cancel the push?
duggan 39 minutes ago||
It does, you're just running a command over ssh, so if you've a particularly long build then something more involved may make more sense.
VorpalWay 1 minute ago||
Most builds take a long time, at least in C++ and Rust (the two languages I work in). And from what I have seen of people working in Python, the builds aren't fast there either (far faster of course, but still easily a minute or two).

Also, how would PRs and code review be handled?

Your suggestion really only makes sense for a small single developer hobby project in an interpreted language. Which, if that is what you intended, fair enough. But there really wasn't enough context to ascertain that.

shykes 3 hours ago||
In moments like this, it's useful to have a "break glass" mode in your CI tooling: a way to run a production CI pipeline from scratch, when your production CI infrastructure is down. Otherwise, if your CI downtime coincides with other production downtime, you might find yourself with a "bricked" platform. I've seen it happen and it is not fun.

It can be a pain to setup a break-glass, especially if you have a lot of legacy CI cruft to deal with. But it pays off in spades during outages.

I'm biased because we (dagger.io) provide tooling that makes this break-glass setup easier, by decoupling the CI logic from CI infrastructure. But it doesn't matter what tools you use: just make sure you can run a bootstrap CI pipeline from your local machine. You'll thank me later.

nadirollo 1 hour ago||
This is a must when your systems deal with critical workloads. At Fastly, we process a good chunk of the internet's traffic and can't afford to be "down" while waiting for the CI system to recover in the event of a production outage.

We built a CI platform using dagger.io on top of GH Actions, and the "break glass" pattern was not an afterthought; it was a requirement (and one of the main reasons we chose dagger as the underlying foundation of the platform in the first place)

hinkley 26 minutes ago|||
It’s a hard sell. I always get blank looks when I suggest it, and often have to work off book to get us there.

I generally recommend that the break glass solution always be pair programmed.

alex_suzuki 2 hours ago|||
100%. We used to design the pipeline a way that is easily reproducible locally, e.g. doesn’t rely on plugins of the CI runtime. Think build.sh shell script, normally invoked by CI runner but just as easy to run locally.
hinkley 24 minutes ago||
My automation is always an escalation of a run book that has gotten very precise and handles corner cases.

Even if I get the idea of an automation before there’s a run book for it.

tomwphillips 2 hours ago||
A while back I think I heard you on a podcast describing these pain points. Experienced them myself; sounded like a compelling solution. I remember Dagger docs being all about AI a year or two ago, and frankly it put me off, but that seems to have gone again. Is your focus back to CI?
shykes 2 hours ago||
Yes, we are re-focused on CI. We heard loud and clear that we should pick a lane: either a runtime for AI agents, or deterministic CI. We pick CI.

Ironically, this makes Dagger even more relevant in the age of coding agents: the bottleneck increasingly is not the ability to generate code, but to reliably test it end-to-end. So the more we all rely on coding agents to produce code, the more we will need a deterministic testing layer we can trust. That's what Dagger aspires to be.

For reference, a few other HN threads where we discussed this:

- https://news.ycombinator.com/item?id=46734553

- https://news.ycombinator.com/item?id=46268265

tomwphillips 2 hours ago||
That's good - I'll reconsider Dagger.

Yes, I agree on your assessment. AI means a higher rate of code changes, so you need more robust and fast CI.

ryandrake 53 minutes ago||
Insert the standard comment about how git doesn't even need a hub. The whole point of it is that it's distributed and doesn't need to be "hosted" anywhere. You can push or pull from any repo on anyone's machine. Shouldn't everyone just treat GitHub as an online backup? Zero reason it being down should block development.
anon7000 48 minutes ago|
The problem is that any kind of automatic code change process like CI, PRs, code review, deployments, etc etc are based on having a central git server. Even security may be based on SSO roles synced to GH allowing access to certain repos.

A self-hosted git server is trivial. Making sure everything built on top of that is able to fallback to that is not. Especially when GH has so many integrations out of the box

2001zhaozhao 7 minutes ago||
Forgejo has all of the features you mentioned and is completely open source!
zthrowaway 3 hours ago||
Microslop ruins everything it touches.
akoumjian 3 hours ago||
Is this related to Cloudflare?

I'm getting cf-mitigated: challenge on openai API requests.

https://www.cloudflarestatus.com/ https://status.openai.com/

pothamk 3 hours ago||
What’s interesting about outages like this is how many things depend on GitHub now beyond just git hosting. CI pipelines, package registries, release automation, deployment triggers, webhooks — a lot of infrastructure quietly assumes GitHub is always available. When GitHub degrades, the blast radius is surprisingly large because it breaks entire build and release chains, not just repo browsing.
littlestymaar 3 hours ago|
> a lot of infrastructure quietly assumes GitHub is always available

Which is really baffling when talking about a service that has at least weekly hicups even when it's not a complete outage.

There's almost 20 outages listed on HN over the past two months: https://news.ycombinator.com/from?site=githubstatus.com so much for “always available”.

pothamk 3 hours ago||
Part of it is probably historical momentum. GitHub started as “just git hosting,” so a lot of tooling gradually grew around it over the years — Actions, package registries, webhooks, release automation, etc. Once teams start wiring all those pieces together, replacing or decoupling them becomes surprisingly hard, even if everyone knows it’s a single point of failure.
cpfohl 3 hours ago||
I swear this is my fault. I can go weeks without doing infra work. Github does fine, I don't see any hiccups, status page is all green.

But the day comes that I need to tweak a deploy flow, or update our testing infra and about halfway through the task I take the whole thing down. It's gotten to the point where when there's an outage I'm the first person people ask what I'm doing...and it's pretty dang consistent....

aezart 1 hour ago||
Sounds like my Dad, who used to have an uncanny ability to get stuck in elevators. Even got stuck in one with his claustrophobia therapist.
LollipopYakuza 2 hours ago|||
Plot twist: cpfohl works at Github and actually messes with the infra.
sidewndr46 1 hour ago||
Second plot twist: cpfohl actually works at Microsoft on Copilot
wolfi1 3 hours ago|||
do you know the Pauli-Effect? https://en.wikipedia.org/wiki/Pauli_effect
cperciva 1 hour ago|||
Related: In FreeBSD we used to talk often about the Wemm Field. Peter Wemm was one of the early FreeBSD developers and responsible for most of the early project server cluster, and hardware had a phenomenal habit of breaking in his vicinity. One notable story I heard involved transporting servers between data centers and hitting a Christmas tree in the middle of a highway... in March.
macintux 2 hours ago||||
At my old job we’d call that Daily bogons (my last name). Didn’t know I was in such illustrious company.
cpfohl 2 hours ago|||
Brilliant. I love it
hmokiguess 2 hours ago|||
You should be promoted to SRE - Schrodinger Reliability Engineer
trigvi 1 hour ago|||
Simple solution: do infra work every few months instead of every few weeks.
Imustaskforhelp 2 hours ago|||
Just let us know in advance when you want to do infra work from now on, alright?
cpfohl 2 hours ago||
I’ll try. Lemme know if you need a day off too…
Imustaskforhelp 2 hours ago||
I know a guy who knows a guy who might need a day off haha

And they are gonna give a pizza party if I get them a day off. I am gonna share a slice with ya too.

Doing a github worldwide outage by magical quantum entanglement for a slice of pizza? I think I would take that deal! xD.

RGamma 2 hours ago||
Surely this would earn you loads of internet street cred.
joecool1029 3 hours ago||
codeberg might be a little slower on git cli, but at least it's not becoming a weekly 'URL returned error: 500' situation...
popcornricecake 3 hours ago||
These days it feels like people have simply forgotten that you could also just have a bare repository on a VPS and use it over ssh.
hrmtst93837 1 hour ago|||
I've found that a bare repo over SSH is the simplest way to keep control and reduce attack surface, especially when you don't need fancy PR workflows. I ran many projects with git init --bare on a Debian VPS, controlled access with authorized_keys and git-shell, and wrote a post-receive hook that runs docker-compose pull and systemctl restart so pushes actually deploy. The tradeoff is you lose built-in PRs, issue tracking, and easy third party CI, so either add gitolite or Gitea for access and a simple web UI, or accept writing hooks, backups, receive.denyNonFastForwards, and scheduled git gc to avoid surprises at 2AM.
yoyohello13 3 hours ago|||
Most developers don’t even know git and GitHub are different things…
mynameisvlad 3 hours ago|||
I mean, this isn't a 'URL returned error: 500' situation for anything that Codeberg provides considering this is an issue with Copilot and Actions.
joecool1029 3 hours ago||
Except actually it was, that was what my git client was reporting trying to run a pull.
mynameisvlad 3 hours ago||
I'm going to trust the constant stream of updates from the company itself which shows exactly what went down and came back up rather than a random anecdote.
iovoid 1 hour ago|||
If you look at the incident details it also claims most services were impacted.

> Git Operations is experiencing degraded availability. We are continuing to investigate.

https://www.githubstatus.com/incidents/n07yy1bk6kc4

mananaysiempre 57 minutes ago||||
Recent years have shown this to be the wrong prediction strategy. The reason seems to be an incentive imbalance where there are quite a few reasons for companies to lie (including their own CLAs) and not a lot of repercussions for doing so (everybody competes on lock-in, not on product). Of course, the word-of-mouth approach is also exploitable by dishonest actors, but thus far there doesn’t look to be a lot of exploitation going on, likely because there’s little reason to bother (once again, lock-in is king).
workethics 2 hours ago||||
I only found this post because I decided to check HN after getting HTTP 500 errors pulling some repos.
slopinthebag 50 minutes ago|||
This seems intelligent, after all companies are incapable of making errors in reporting and also have absolutely no incentive to lie about stuff like that. Those 500 errors others have reported as experiencing must have just been the wind.
Imustaskforhelp 2 hours ago|||
I used to use codeberg 2 years ago. I may have been ahead of my time.
ocdtrekkie 3 hours ago|||
I rarely successfully get Codeberg URLs to load. Which is sad because I actually would very much like to recommend it but I find it unreliable as a source.

That being said, GitHub is Microsoft now, known for that Microsoft 360 uptime.

Imustaskforhelp 2 hours ago|||
I have never had this issue. IIRC Codeberg has a matrix community, they are a non-profit and they would absolutely love to hear your feedback of them. I hope that you can find their matrix community and join it and talk with them

Actually here you go, I have pasted the matrix link to their community, hope it helps https://matrix.to/#/#codeberg-space:matrix.org

cyberax 3 hours ago|||
> Microsoft 360 uptime

I mean... It's right in the name! It's up for 360 days a year.

IshKebab 3 hours ago||
I mean... you understand the scale difference right?
duckkg5 3 hours ago|
I would so very much love to see GitHub switch gears from building stuff like Copilot etc and focus on availability
adithyareddy 1 hour ago||
The #1 priority at GitHub for this year is migrating from their own data center to Azure, any other work that gets in the way of this is being deprioritized: https://thenewstack.io/github-will-prioritize-migrating-to-a...
multisport 57 minutes ago||
> It’s existential for GitHub to have the ability to scale to meet the demands of AI and Copilot, and Azure is our path forward. W

More existential than going down a few times a week?

coffeebeqn 2 hours ago|||
This is an absurd state they are at! Weekly outages in 2025 and 2026. From developer beloved and very solid to Microslop went faster than I expected
esseph 1 hour ago||
They may have been Beloved before MS bought them. It takes awhile for technical debt to catch up.
hrmtst93837 2 hours ago|||
I think GitHub shipping Copilot while suffering availability issues is a rational choice because they get more measurable business upside from a flashy AI product than from another uptime graph. In my experience the only things that force engineering orgs to prioritize uptime are public SLOs with enforced error budgets that can halt rollouts, plus solid observability like Prometheus and OpenTelemetry tracing, canary rollouts behind feature flags, multi-region active-active deployments, and regular chaos experiments to surface regressions. If you want them to change, push for public SLOs or pay for an enterprise SLA, otherwise accept that meaningful uptime improvements cost money and will slow down the flashy stuff.
rschiavone 2 hours ago||
Unless a major out(r)age forces a change of leadership, expect more slop down our throats.
More comments...