Top
Best
New

Posted by linolevan 1/19/2026

What came first: the CNAME or the A record?(blog.cloudflare.com)
466 points | 162 commentspage 3
mcfedr 1/20/2026|
everything about this reads like an excuse from a team that doesnt want to admit they screwed up

nitpicking at the RFCs when everyone knows DNS is a big old thing with lots going on

how do they not have basic integration tests to check how clients resolve

it seems very unlike cloudflare of old that was much more up front - there is no talk of the need to improve process, just blaming other people

mintflow 1/20/2026||
After reading the article, I am wondering is that is there no test case to coverage the behavior that modify the CNAME order in the response? I think it should be simple to run a fleet of various OS/DNS client combinations to test the behavior.

And I also being shocked that Cisco Switch goes to reboot loop with this DNS order issue.

ShroudedNight 1/19/2026||
I'm not an IETF process expert. Would this be worth filing errata against the original RFC in addition to their new proposed update?

Also, what's the right mental framework behind deciding when to release a patch RFC vs obsoleting the old standard for a comprehensive update?

hdjrudni 1/19/2026||
I don't know the official process, but as a human that sometimes reads and implements IETF RFCs, I'd appreciate updates to the original doc rather than replacing it with something brand new. Probably with some dated version history.

Otherwise I might go to consult my favorite RFC and not even know its been superseded. And if it has been superseded with a brand new doc, now I have to start from scratch again instead of reading the diff or patch notes to figure out what needs updating.

And if we must supersede, I humbly request a warning be put at the top, linking the new standard.

ShroudedNight 1/19/2026||
At one point I could have sworn they were sticking obsoletion notices in the header, but now I can only find them in the right side-bar:

https://datatracker.ietf.org/doc/html/rfc5245

I agree, that it would be much more helpful if made obvious in the document itself.

It's not obvious that "updated by" notices are treated in any more of a helpful manner than "obsoletes"

fweimer 1/19/2026||
There already is an I-D on this topic (based on previous work): https://datatracker.ietf.org/doc/draft-jabley-dnsop-ordered-...
renewiltord 1/19/2026||
Nice analysis. Boy I can’t imagine having to work at Cloudflare on this stuff. A month to get your “small in code” change out only to find some bums somewhere have written code that will make it not work.
urbandw311er 1/19/2026||
Or — hot take — to find out that you made some silly misinterpretation of the RFC that you then felt the need to retrospectively justify.
stackskipton 1/19/2026|||
Or when working on massive infrastructure like this, you write plenty of tests that would have saved you a month worth of work.

They write reordering, push it and glibc tester fires, fails and you quickly discover "Crap, tests are failing and dependency (glibc) doesn't work way I thought it would."

renewiltord 1/19/2026||
I suspect that if you could save them this time, they'd gladly pay you for it. It'll be a bit of a sell, but they seem like a fairly sensible org.
rjh29 1/19/2026||
It was glibc's resolver that failed - not exactly obscure. It wasn't properly tested or rolled out, plain and simple.
runningmike 1/19/2026||
The end of this blog is …. “ To learn more about our mission to help build a better Internet,”

Reminds me of https://news.ycombinator.com/item?id=37962674 or see https://tech.tiq.cc/2016/01/why-you-shouldnt-use-cloudflare/

therein 1/19/2026||
After the release got reverted, it took an 1hr28min for the deployment to propagate. You'd think that would be a very long time for CloudFlare infrastructure.
rhplus 1/19/2026||
We should probably all be glad that CloudFlare doesn't have the ability to update its entire global fleet any faster than 1h 28m, even if it’s a rollback operation.

Any change to a global service like that, even a rollback (or data deployment or config change), should be released to a subset of the fleet first, monitored, and then rolled out progressively.

tuetuopay 1/19/2026|||
Given the seriousness of outages they make with instant worldwide deploys, I’m glad they took it calmly.
steve1977 1/19/2026||
They had to update all the down detectors first.
inkyoto 1/20/2026||
This could be a great fit for Prolog, in fact, as it excels at the search.

Each resolved record would be asserted as a fact, and a tiny search implementation would run after all assertions have been made to resolve the IP address irrespective of the order in which the RRsets have arrived.

A micro Prolog implementation could be rolled into glibc's resolver (or a DNS resolver in general) to solve the problem once and for all.

0xbadcafebee 1/20/2026||
It's kind of weird that they didn't expect this. DNS resolvers are famously inconsistent, with changes sometimes working or not working, breaking or not breaking. Virtually any change you make to what DNS serves or how will cause inconsistent behavior somewhere. (DNS encompasses hundreds of RFCs)
netfortius 1/20/2026|
Why couldn't a "code specialized" LLM/AI be added to the change flow, in the cloudflare process, and asked to check against all known implementations of name resolution stubs, dns clients, etc., etc. If not in such cases, then when?
More comments...