Posted by linolevan 1/19/2026
It’s always DNS.
Please order the answer in the order the resolutions were performed to arrive at the final answer (regardless of cache timings). Anything else makes little sense, especially not in the name of some micro-optimization (which could likely be approached in other ways that don’t alter behaviour).
Something is broken in Cloudflare since a couple of years. It takes a very specific engineering culture to run the internet and it's just not there anymore.
Nowhere the RFC suggests multiple CNAMEs need to be in a specific order.
There's also so much of it, and it mostly works, most of the time. This creates a hysteresis loop in human judgement of efficacy: even a blind chicken gets corn if it's standing in it. Cisco bought cisco., but (a decade ago, when I had access to the firehose) on any given day belkin. would be in the top 10 TLDs if you looked at the NXDOMAIN traffic. Clients don't opportunistically try TCP (which they shouldn't, according to the specification...), but we have DoT (...but should in practice). My ISPs reverse DNS implementation is so bad that qname minimization breaks... but "nobody should be using qname minimization for reverse DNS", and "Spamhaus is breaking the law by casting shades at qname minimization".
"4096 ought to be enough for anybody" (no, frags are bad. see TCP above). There is only ever one request in a TCP connection... hey, what are these two bytes which are in front of the payload in my TCP connection? People who want to believe that their proprietary headers will be preserved if they forward an application protocol through an arbitrary number of intermediate proxy / forwarders (because that's way easier than running real DNS at the segment edge and logging client information at the application level).
Tangential, but: "But there's more to it, because people doing these things typically describe how it works for them (not how it doesn't work) and onlookers who don't pay close attention conclude "it works"." http://consulting.m3047.net/dubai-letters/dnstap-vs-pcap.htm...
> To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF
Of course.
As a website host/maintainer, I am happy that the DNS 'ANY' query has been deprecated.
I am sure if you are a network engineer or ISP, then it propbably annoys you no end.
Why? What benefit does this bring you, or what negative consequence would otherwise have resulted for you?
Is that your only argument?
Randomization within the final answer RRSet is fine (and maybe even preferred in a lot of cases)
We thought it as just the default ntp servers abut had some reboot during this event because www.cisco.com was unavailable.
Honestly, it shouldn't matter. Anybody who's using a stub resolver where this matters, where /anything/ matters really, should be running their own local caching / recursing resolver. These oftentimes have options for e.g. ordering things for various reasons.
> Another notable affected implementation was the DNSC process in three models of Cisco ethernet switches. In the case where switches had been configured to use 1.1.1.1 these switches experienced spontaneous reboot loops when they received a response containing the reordered CNAMEs.
... but I am surprised by this:
> One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution.
Not that glibc did anything wrong -- I'm just surprised that anyone is implementing an internet-scale caching resolver without a comprehensive test suite that includes one of the most common client implementations on the planet.
Also no, the client doesn't need more memory to parse the out-of-order response, it can take multiple passes through the kilobyte.
Of course, if the server sends unrelated address records in the answer section, that will result in incorrect data. (A simple counter can detect the end of the answer section, so it's not necessary to chase CNAMEs for section separation.)
As an aside, I am super annoyed at Cloudflare for calling their proxy records "CNAME" in their UI. Those are nothing like CNAMEs and have caused endless confusion.
Maybe I'm being overly-cynical but I have a hard time believing that they deliberately omitted a test specifically because they reviewed the RFC and found the ambiguous language. I would've expected to see some dialog with IETF beforehand if that were the case. Or some review of the behavior of common DNS clients.
It seems like an oversight, and that's totally fine.
rrs = resolver.resolve('www.example.test')
assert Record("cname1.example.test", type="CNAME") in rrs
assert Record("192.168.0.1", type="A") in rrs
Which wouldn't have caught the ordering problem.Honestly, though, I’d be surprised if they actually even considered it. Everything about the article says to me that the engineer(s) who caused this problem are desperately trying to deflect blame for not having a comprehensive test suite. Sorry, but you don’t go tweaking order of results for such a long-standing, high volume, and crucial protocol just because the 40 year old spec isn’t clear about it.