Top
Best
New

Posted by linolevan 1/19/2026

What came first: the CNAME or the A record?(blog.cloudflare.com)
466 points | 162 commentspage 2
tuetuopay 1/19/2026|
Many rightfully interpret the RFC as "CNAME have to be before A", but the issue persists inbetween CNAMEs in the chain as noted in the article. If a record in the middle of the chain expires, glibc would still fail if the "middle" record was to be inserted between CNAMEs and A records.

It’s always DNS.

wolttam 1/19/2026||
My take is quite cynical on this.. This post reads to me like a post-justification of some strange newly introduced behaviour.

Please order the answer in the order the resolutions were performed to arrive at the final answer (regardless of cache timings). Anything else makes little sense, especially not in the name of some micro-optimization (which could likely be approached in other ways that don’t alter behaviour).

Gigachad 1/19/2026|
The DNS specification should be updated to say CNAMES _must_ be ordered at the top rather than "possibly". Cloudflare was complying with the specification. Cisco was relying on unspecified behavior that happened to be common.
alexey-salmin 1/20/2026|||
The only reasonable interpretation of "possibly prefaced" is that the CNAMEs either come first or not at all (hence "possibly"). Nowhere the RFC suggests that they may come in the middle.

Something is broken in Cloudflare since a couple of years. It takes a very specific engineering culture to run the internet and it's just not there anymore.

Dylan16807 1/20/2026||
Except that "first or not at all" doesn't prevent this bug from triggering.

Nowhere the RFC suggests multiple CNAMEs need to be in a specific order.

skywhopper 1/20/2026||||
Cloudflare broke clients all over the world. What the 40 year old RFC says is not the de facto “specification” at this point.
alt227 1/20/2026||
Cloudflare broke 'Cisco' clients all over the world. Not CFs problem that the biggest router vendor in the world programmed their routers wrongly.
hdgvhicv 1/20/2026|||
I’m no fan of the centralised intenet cloudflare heralds, but blaming anyone but Cisco for this reboot behaviour is wrong.
m3047 1/19/2026||
DNS is a wire protocol, payload specification, and application protocol. For all of that, I personally wonder whether its enduring success isn't that it's remarkably underspecified when you get to the corner cases.

There's also so much of it, and it mostly works, most of the time. This creates a hysteresis loop in human judgement of efficacy: even a blind chicken gets corn if it's standing in it. Cisco bought cisco., but (a decade ago, when I had access to the firehose) on any given day belkin. would be in the top 10 TLDs if you looked at the NXDOMAIN traffic. Clients don't opportunistically try TCP (which they shouldn't, according to the specification...), but we have DoT (...but should in practice). My ISPs reverse DNS implementation is so bad that qname minimization breaks... but "nobody should be using qname minimization for reverse DNS", and "Spamhaus is breaking the law by casting shades at qname minimization".

"4096 ought to be enough for anybody" (no, frags are bad. see TCP above). There is only ever one request in a TCP connection... hey, what are these two bytes which are in front of the payload in my TCP connection? People who want to believe that their proprietary headers will be preserved if they forward an application protocol through an arbitrary number of intermediate proxy / forwarders (because that's way easier than running real DNS at the segment edge and logging client information at the application level).

Tangential, but: "But there's more to it, because people doing these things typically describe how it works for them (not how it doesn't work) and onlookers who don't pay close attention conclude "it works"." http://consulting.m3047.net/dubai-letters/dnstap-vs-pcap.htm...

teddyh 1/19/2026||
Cloudflare is well known for breaking DNS standards, and also then writing a new RFC to justify their broken behavior, and getting IETF to approve it. (The existence of RFC 8482 is a disgrace to everyone involved.)

> To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF

Of course.

alt227 1/20/2026|
This really depends on what side of the fence you are on.

As a website host/maintainer, I am happy that the DNS 'ANY' query has been deprecated.

I am sure if you are a network engineer or ISP, then it propbably annoys you no end.

teddyh 1/22/2026||
> As a website host/maintainer, I am happy that the DNS 'ANY' query has been deprecated.

Why? What benefit does this bring you, or what negative consequence would otherwise have resulted for you?

alt227 1/27/2026||
Because without the ANY query it is much more difficult for people to immediately enumerate a full list of all subdomains and IPs for a given domain name. They need to be queried individually.
teddyh 1/28/2026||
That is false. If all you want is all subdomains and IP addresses, you can query each enumerated name for A records; you get any NS records (or CNAME records) on that name for free in the answer, and can follow those. ANY queries are not needed, and their removal does not help you in the slightest.

Is that your only argument?

sebastianmestre 1/19/2026||
I kind of wish they start sending records in randomized order to take out all the broken implementations that depend on such a fragile property
0xbadcafebee 1/20/2026||
That won't cause implementations to be fixed. The implementations in question are in devices that are old (DNS is over 40 years old) and will never be upgraded. Affected users will just choose a different DNS resolver. Pretty soon word will get around that "if you don't want a broken device, don't use CloudFlare for DNS". It's less hassle for CloudFlare to just maintain the existing de-facto standard.
wolttam 1/19/2026|||
Is the property of an answer being ordered in the order that resolutions were performed to construct it /that/ fragile?

Randomization within the final answer RRSet is fine (and maybe even preferred in a lot of cases)

t0mas88 1/19/2026||
Well cisco had their switches get into a boot loop, that sounds very broken...
hdgvhicv 1/20/2026||
Yes it’s a well known behaviour from these Cisco switches, not just reliant on name ordering. If SBS fails they reboot.

We thought it as just the default ntp servers abut had some reboot during this event because www.cisco.com was unavailable.

m3047 1/20/2026|||
That would be a Flag Day initiative. ;-)

Honestly, it shouldn't matter. Anybody who's using a stub resolver where this matters, where /anything/ matters really, should be running their own local caching / recursing resolver. These oftentimes have options for e.g. ordering things for various reasons.

frumplestlatz 1/19/2026||
Given my years of experience with Cisco "quality", I'm not surprised by this:

> Another notable affected implementation was the DNSC process in three models of Cisco ethernet switches. In the case where switches had been configured to use 1.1.1.1 these switches experienced spontaneous reboot loops when they received a response containing the reordered CNAMEs.

... but I am surprised by this:

> One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution.

Not that glibc did anything wrong -- I'm just surprised that anyone is implementing an internet-scale caching resolver without a comprehensive test suite that includes one of the most common client implementations on the planet.

danepowell 1/19/2026||
Doesn't the precipitating change optimize memory on the DNS server at the expense of additional memory usage across millions of clients that now need to parse an unordered response?
Dylan16807 1/19/2026||
The memory involved is a kilobyte. The optimization isn't important anywhere. The fragility is what's important.

Also no, the client doesn't need more memory to parse the out-of-order response, it can take multiple passes through the kilobyte.

fweimer 1/19/2026||
For most client interfaces, it's possible to just grab the addresses and ignore the CNAMEs altogether because the names do not matter, or only the name on the address record.

Of course, if the server sends unrelated address records in the answer section, that will result in incorrect data. (A simple counter can detect the end of the answer section, so it's not necessary to chase CNAMEs for section separation.)

skywhopper 1/20/2026||
This all reads like an embarrassed engineer who can’t admit they neglected to have a comprehensive to-the-byte test suite for their second-most-important-on-the-Internet DNS server, overcompensating by blaming a 40-year-old standard that (1) they probably hadn’t consulted, and (2) no one else seems to have issues with; and proposing to update core Internet standards, rather than just accept that they made a mistake when they assumed they could just append to what any regular user of DNS expects to be a meaningfully-ordered list.
peanut-walrus 1/20/2026||
I've always found it weird that CNAMEs get resolved and lumped into the answer section in the first place. While helpful, this is not what you asked for and it makes much more sense to me to stick that in additional section instead.

As an aside, I am super annoyed at Cloudflare for calling their proxy records "CNAME" in their UI. Those are nothing like CNAMEs and have caused endless confusion.

kayson 1/19/2026|
> However, we did not have any tests asserting the behavior remains consistent due to the ambiguous language in the RFC.

Maybe I'm being overly-cynical but I have a hard time believing that they deliberately omitted a test specifically because they reviewed the RFC and found the ambiguous language. I would've expected to see some dialog with IETF beforehand if that were the case. Or some review of the behavior of common DNS clients.

It seems like an oversight, and that's totally fine.

bombcar 1/19/2026||
I took it as being "we wrote the tests to the standard" and then built the code, and whoever was writing the tests didn't read that line as a testable aspect.
kayson 1/19/2026||
Fair enough.
supriyo-biswas 1/19/2026|||
My reading of that statement is their test, assuming they had one, looked something like this:

    rrs = resolver.resolve('www.example.test')
    assert Record("cname1.example.test", type="CNAME") in rrs
    assert Record("192.168.0.1", type="A") in rrs
Which wouldn't have caught the ordering problem.
hdjrudni 1/19/2026||
It's implied that they intentionally tested it that way, without any assertions on the order. Not by oversight of incompetence, but because they didn't want to bake the requirement in due to uncertainty.
skywhopper 1/20/2026|||
That would be silly to stick that tightly to a 40 year old standard. They can easily observe the behavior of every other public DNS resolver (they are Cloudflare, so gathering data on such a scale should be easy) and see how they return results.

Honestly, though, I’d be surprised if they actually even considered it. Everything about the article says to me that the engineer(s) who caused this problem are desperately trying to deflect blame for not having a comprehensive test suite. Sorry, but you don’t go tweaking order of results for such a long-standing, high volume, and crucial protocol just because the 40 year old spec isn’t clear about it.

account42 1/20/2026|||
That approach only makes sense if tests are immutable though. If you are unsure if the order matters you should still test for it so you get a reminder to re-check your assumptions when the order changes.
mcfedr 1/20/2026||
its pretty concerning that such a large organisation doesnt do any integration tests with their dns infrastructure
More comments...