What came first: the CNAME or the A record?

Posted by linolevan 1/19/2026

What came first: the CNAME or the A record?(blog.cloudflare.com)

466 points | 162 comments

steve1977 1/19/2026|

I don't find the wording in the RFC to be that ambiguous actually.

> The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.

The "possibly preface" (sic!) to me is obviously to be understood as "if there are any CNAME RRs, the answer to the query is to be prefaced by those CNAME RRs" and not "you can preface the query with the CNAME RRs or you can place them wherever you want".

mrmattyboy 1/19/2026||

I agree this doens't seem too ambiguous - it's "you may do this.." and they said "or we may do the reverse". If I say you're could prefix something.. the alternative isn't that you can suffix it.

But also.. the programmers working on the software running one of the most important (end-user) DNS servers in the world:

1. Changes logic in how CNAME responses are formed

2. I assume some tests at least broke that meant they needed to be "fixed up" (y'know - "when a CNAME is queried, I expect this response")

3. No one saw these changes in test behavoir and thought "I wonder if this order is important". Or "We should research more into this", Or "Are other DNS servers changing order", Or "This should be flagged for a very gradual release".

4. Ends up in test environment for, what, a month.. nothing using getaddrinfo from glibc is being used to test this environment or anyone noticed that it was broken

Cloudflare seem to be getting into thr swing of breaking things and then being transparent. But this really reads as a fun "did you know", not a "we broke things again - please still use us".

There's no real RCA except to blame an RFC - but honestly, for a large-scale operation like there's this seems very big to slip through the cracks.

I would make a joke about South Park's oil "I'm sorry".. but they don't even seem to be

black3r 1/19/2026|||

> 4. Ends up in test environment for, what, a month.. nothing using getaddrinfo from glibc is being used to test this environment or anyone noticed that it was broken

"Testing environment" sounds to me like a real network real user devices are used with (like the network used inside CloudFlare offices). That's what I would do if I was developing a DNS server anyway, other than unit tests (which obviously wouldn't catch this unless they were explicitly written for this case) and maybe integration/end-to-end tests, which might be running in Alpine Linux containers and as such using musl. If that's indeed the case, I can easily imagine how noone noticed anything was broken. First look at this line:

> Most DNS clients don’t have this issue. For example, systemd-resolved first parses the records into an ordered set:

Now think about what real end user devices are using: Windows/macOS/iOS obviously aren't using glibc and Android also has its own C library even though it's Linux-based, and they all probably fall under the "Most DNS clients don't have this issue.".

That leaves GNU/Linux, where we could reasonably expect most software to use glibc for resolving queries, so presumably anyone using Linux on their laptop would catch this right? Except most distributions started using systemd-resolved (most notable exception is Debian, but not many people use that on desktops/laptops), which is a locally-cached recursive DNS server, and as such acts as a middleman between glibc software and the network configured DNS server, so it would resolve 1.1.1.1 queries correctly, and then return the results from its cache ordered by its own ordering algorithm.

skywhopper 1/20/2026|||

For the output of Cloudflare’s DNS server, which serves a huge chunk of the Internet, they absolutely should have a comprehensive byte-by-byte test suite, especially for one of the most common query/result patterns.

account42 1/20/2026|||

> other than unit tests (which obviously wouldn't catch this unless they were explicitly written for this case)

They absolutely should have unit tests that detect any change in output and manually review those changes for an operation of this size.

bpt3 1/19/2026||||

> Ends up in test environment for, what, a month.. nothing using getaddrinfo from glibc is being used to test this environment or anyone noticed that it was broken

This is the part that is shocking to me. How is getaddrinfo not called in any unit or system tests?

zinekeller 1/19/2026|||

As black3r mentioned (https://news.ycombinator.com/item?id=46686096), it is likely rearranged by systemd, therefore only non-systemd glibc distributions are affected.

I would hazard a guess that their test environment have both the systemd variant and the Unbound variants (Unbound technically does not arrange them, but instead reconstructs it according to RFC "CNAME restart" logic because it is a recursive resolver in itself), but not just plain directly-piped resolv.conf (Presumably because who would run that in this day and age. This is sadly just a half-joke, because only a few people would fall on this category.)

WGH_ 1/24/2026||

> it is likely rearranged by systemd, therefore only non-systemd glibc distributions are affected.

systemd doesn't imply installed and running systemd-resolved though. I believe it's usually not enabled by default.

zinekeller 1/29/2026||

> I believe it's usually not enabled by default.

Just verify modern OSes now, they definitely do mediate via systemd-resolver (including in server OSes).

SAI_Peregrinus 1/19/2026|||

Probably Alpine containers, so musl's version instead of glibc's.

bashook 1/20/2026||||

I was even more surprised to see that the RFC draft had original text from the author dating back to 2015. https://github.com/ableyjoe/draft-jabley-dnsop-ordered-answe...

We used to say at work that the best way to get promoted was to be the programmer that introduced the bug into production and then fix it. Crazy if true here...

jdmnd 1/20/2026||

What you're suggesting seems like a spectacular leap. I do not think it is very likely that the unnamed employee at Cloudflare that was micro-optimising code in the DNS resolver is also the author of this RFC, Joe Abley (the current Director of Engineering at the company, and formerly Director of DNS Operations at ICANN).

jrochkind1 1/19/2026||||

> I assume some tests at least broke that meant they needed to be "fixed up"

OP said:

"However, we did not have any tests asserting the behavior remains consistent due to the ambiguous language in the RFC."

One could guess it's something like -- back when we wrote the tests, years ago, whoever did it missed that this was required, not helped by the fact that the spec proceeded RFC 2119 standardizing the all-caps "MUST" "SHOULD" etc language, which would have helped us translsate specs to tests more completely.

account42 1/20/2026|||

You'd think that something this widely used would have golden tests that detect any output change to trigger manual review but apparently they don't.

jrochkind1 1/20/2026||

Oh, they explain, if I understand right, they did the output change intentionally, for performance reasons. Based on the inaccurate assumption that order did not matter in DNS responses -- becuase there are OTHER aspects of DNS responses in which, by spec, order does not matter, and because there were no tests saying order mattered for this component.

> "The order of RRs in a set is not significant, and need not be preserved by name servers, resolvers, or other parts of the DNS." [from RFC]

> However, RFC 1034 doesn’t clearly specify how message sections relate to RRsets.

The developer(s) was assuming order didn't matter in general, cause the RFC said it didn't for one aspect, and intentionally made a change to order for performance reasons. But it turned out that change did matter.

Mistakes of this kind seem unavoidable, this one doesn't necessary say to me the developers made a mistake i never could or something.

I think the real conclusion is they probably need tests using actual live network stacks with common components, and why didn't they have those? Not just unit tests or with mocks, but tests that would have actually used real getaddrinfo function in glibc and shown it failing?

ibejoeb 1/20/2026|||

Even if there weren't tests for the return order, I would have bet that there were tests of backbone resolvers like getaddrinfo. Is it really possible that the first time anyone noticed that that crashed, or that ciscos bootlooped, was on a live query?

laixintao 1/20/2026|||

Yes, at least they should test the glibc case.

inopinatus 1/19/2026|||

The article makes it very clear that the ambiguity arises in another phrase: “difference in ordering of the RRs in the answer section is not significant”, which is applied to an example; the problem with examples being that they are illustrative, viz. generalisable, and thus may permit reordering everywhere, and in any case, whether they should or shouldn’t becomes a matter of pragmatic context.

Which goes to show, one person’s “obvious understanding” is another’s “did they even read the entire document”.

All of which also serves to highlight the value of normative language, but that came later.

PunchyHamster 1/20/2026||

it wouldn't be a problem if they tested it properly... especially WHEN stuff is ambigous

nraynaud 1/20/2026||

They may not have realized their interpretation is ambiguous until after the incident, that’s the kind of stuff you realize after you find a bug and do a deep dive in the literature for a post mortem. They probably worked with the certitude that record order is irrelevant until that point.

the_mitsuhiko 1/19/2026|||

> I don't find the wording in the RFC to be that ambiguous actually.

You might not find it ambiguous but it is ambiguous and there were attempts to fix it. You can find a warmed up discussion about this topic here: https://mailarchive.ietf.org/arch/msg/dnsop/2USkYvbnSIQ8s2vf...

a7b3fa 1/19/2026|||

I agree with you, and I also think that their interpretation of example 6.2.1 in the RFC is somewhat nonsensical. It states that “The difference in ordering of the RRs in the answer section is not significant.” But from the RFC, very clearly this comment is relevant only to that particular example; it is comparing two responses and saying that in this case, the different ordering has no semantic effect.

And perhaps this is somewhat pedantic, but they also write that “RFC 1034 section 3.6 defines Resource Record Sets (RRsets) as collections of records with the same name, type, and class.” But looking at the RFC, it never defines such a term; it does say that within a “set” of RRs “associated with a particular name” the order doesn’t matter. But even if the RFC had said “associated with a particular combination of name, type, and class”, I don’t see how that could have introduced ambiguity. It specifies an exception to a general rule, so obviously if the exception doesn’t apply, then the general rule must be followed.

Anyway, Cloudflare probably know their DNS better than I do, but I did not find the article especially persuasive; I think the ambiguity is actually just a misreading, and that the RFC does require a particular ordering of CNAME records.

(ETA:) Although admittedly, while the RFC does say that CNAMEs must come before As in the answer, I don’t necessarily see any clear rule about how CNAME chains must be ordered; the RFC just says “Domain names in RRs which point at another name should always point at the primary name and not the alias ... Of course, by the robustness principle, domain software should not fail when presented with CNAME chains or loops; CNAME chains should be followed”. So actually I guess I do agree that there is some ambiguity about the responses containing CNAME chains.

taeric 1/19/2026|||

Isn't this literally noted in the article? The article even points out that the RFC is from before normative words were standardized for hard requirements.

devman0 1/19/2026|||

Even if 'possibly preface' is interpreted to mean CNAME RRSets should appear first there is still a broken reliance by some resolvers on the order of CNAME RRsets if there is more than one CNAME in the chain. This expectation of ordering is not promised by the relevant RFCs.

paulddraper 1/19/2026|||

100%

I just commented the same.

It's pretty clear that the "possibly" refers to the presence of the CNAME RRs, not the ordering.

Dylan16807 1/19/2026|||

The context makes it less clear, but even if we pretend that part is crystal, a comment that stops there is missing the point of the article. All CNAMEs at the start isn't enough. The order of the CNAMEs can cause problems despite perfect RFC compliance.

andrewshadura 1/19/2026|||

To me, this reads exactly the opposite.

bigstrat2003 1/20/2026|||

My initial reading was "you can place them wherever you want". And given that multiple parties are naturally interpreting the wording in different ways, that means the wording is ambiguous by definition.

VoodooJuJu 1/19/2026||

[dead]

colmmacc 1/20/2026||

I am very petty about this one bug and have a very old axe to grind that this reminded me of! Way back in 2011 CloudFlare launched an incredibly poorly researched feature to just return CNAME records at a domain apex ... RFCs be damned.

https://blog.cloudflare.com/zone-apex-naked-domain-root-doma... , and I quote directly ... "Never one to let a RFC stand in the way of a solution to a real problem, we're happy to announce that CloudFlare allows you to set your zone apex to a CNAME."

The problem? CNAMEs are name level aliases, not record level, so this "feature" would break the caching of NS, MX, and SOA records that exist at domain apexes. Many of us warned them at the time that this would result in a non-deterministic issue. At EC2 and Route 53 we weren't supporting this just to be mean! If a user's DNS resolver got an MX query before an A query, things might work ... but the other way around, they might not. An absolute nightmare to deal with. But move fast and break things, so hey :)

In earnest though ... it's great to see how now CloudFare are handling CNAME chains and A record ordering issues in this kind of detail. I never would have thought of this implicit contract they've discovered, and it makes sense!

ycombiredd 1/20/2026||

You just caused flashbacks of error messages from BIND of the sort "cannot have CNAME and other data", from this proximate cause, and having to explain the problem many, many times. Confusion and ambiguity of understandings have also existed since forever by people creating domain RR's (editing files) or the automated or more machined equivalents.

Related, the phrase "CNAME chains" causes vague memories of confusion surrounding the concepts of "CNAME" and casual usage of the term "alias". Without re-reading RFC1034 today, I recall that my understanding back in the day was that the "C" was for "canonical", and that the host record the CNAME itself resolved to must itself have an A record, and not be another CNAME, and I acknowledge the already discussed topic that my "must" is doing a lot of lifting there, since the RFC in question predates a normative language standard RFC itself.

So, I don't remember exactly the initial point I was trying to get at with my second paragraph; maybe there has always been some various failure modes due to varying interpretations which have only compounded with age, new blood, non-standard language being used in self-serve DNS interfaces by providers, etc which I suppose only strengthens the "ambiguity" claim. That doesn't excuse such a large critical service provider though, at all.

Dylan16807 1/20/2026||

Is a deliberate violation of a spec really a bug? And I don't think their choice was "move fast and break things" at all.

It is a nightmare, but the spec is the source of the nightmare.

patrickmay 1/19/2026||

A great example of Hyrum's Law:

"With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."

combined with failure to follow Postel's Law:

"Be conservative in what you send, be liberal in what you accept."

mmastrac 1/19/2026||

Postel's law is considered more and more harmful as the industry evolved.

CodesInChaos 1/19/2026|||

That depends on how Postel's law is interpreted.

What's reasonable is: "Set reserved fields to 0 when writing and ignore them when reading." (I heard that was the original example). Or "Ignore unknown JSON keys" as a modern equivalent.

What's harmful is: Accept an ill defined superset of the valid syntax and interpret it in undocumented ways.

treve 1/19/2026|||

Good modern protocols will explicitly define extension points, so 'ingoring unknown JSON keys' is in-spec rather than assumed that an implementer will do.

tuetuopay 1/19/2026||||

Funny I never read the original example. And in my book, it is harmful, and even worse in JSON, since it's the best way to have a typo somewhere go unnoticed for a long time.

sweetjuly 1/19/2026||

The original example is very common in ISAs at least. Both ARMv8 and RISC-V (likely others too but I don't have as much experience with them) have the idea of requiring software to treat reserved bits as if they were zero for both reading and writing. ARMv8 calls this RES0 and an hardware implementation is constrained to either being write ignore for the field (eg read is hardwired to zero) or returning the last successful write.

This is useful as it allows the ISA to remain compatible with code which is unaware of future extensions which define new functionality for these bits so long as the zero value means "keep the old behavior". For example, a system register may have an EnableNewFeature bit, and older software will end up just writing zero to that field (which preserves the old functionality). This avoids needing to define a new system register for every new feature.

yxhuvud 1/19/2026|||

I disagree. I find accepting extra random bytes in places to be just as harmful. I prefer APIs that push back and tell me what I did wrong when I mess up.

n2d4 1/19/2026||||

Very much so. A better law would be conservative in both sending and accepting, as it turns out that if you are liberal in what you accept, senders will choose to disobey Postel's law and be liberal in what they send, too.

mikestorrent 1/19/2026|||

It's an oscillation. It goes in cycles. Things formalize upward until you've reinvented XML, SOAP and WSDLs; then a new younger generation comes in and says "all that stuff is boring and tedious, here's this generation's version of duck typing", followed by another ten years of tacking strong types onto that.

MCP seems to be a new round of the cycle beginning again.

Ericson2314 1/20/2026||

No they won't do that, because vibe coding boring tedious shit is easy and looks good to your manager.

I'm dead serious, we should be in a golden age of "programming in the large" formal protocols.

Gigachad 1/19/2026|||

The modern view seems to be you should just immediately abort if the spec isn't being complied with since it's possibly someone trying to exploit the system with malformed data.

esafak 1/19/2026|||

I think it is okay to accept liberally as long as you combine it with warnings for a while to give offenders a chance to fix it.

hdjrudni 1/19/2026|||

"Warnings" are like the most difficult thing to 'send' though. If an app or service doesn't outright fail, warnings can be ignored. Even if not ignored... how do you properly inform? A compiler can spit out warnings to your terminal, sure. Test-runners can log warnings. An RPC service? There's no standard I'm aware of. And DNS! Probably even worse. "Yeah, your RRs are out of order but I sorted them for you." where would you put that?

esafak 1/19/2026|||

> how do you properly inform?

Through the appropriate channels; in-band and out-of-band.

immibis 1/19/2026||

a content-less tautology

diarrhea 1/19/2026|||

Randomly fail or (increasingly) delay a random subset of all requests.

Melonai 1/19/2026||

That sounds awful and will send administrators on a wild goose chase throughout their stack to find the issue without many clues except this thing is failing at seemingly random times. (I myself would suspect something related to network connectivity, maybe requests are timing out? This idea would lead me in the completely wrong direction.)

It also does not give any way to actually see a warning message, where would we even put it? I know for a fact that if my glibc DNS resolver started spitting out errors into /var/log/god_knows_what I would take days to find it, at best the resolver could return some kind of errno with perror giving us a message like "The DNS response has not been correctly formatted", and then hope that the message is caught and forwarded through whatever is wrapping the C library, hopefully into our stderr. And there's so many ways even that could fail.

SahAssar 1/19/2026||

So we arrive at the logical conclusion: You send errors in morse code, encoded as seconds/minutes of failures/successes. Any reasonable person would be able to recognize morse when seeing the patterns on a observability graph.

Start with milliseconds, move on to seconds and so on as the unwanted behavior continues.

dotancohen 1/19/2026||||

The Python 3 community was famously divided on that matter, wrt Python 3. Now that it is over, most people on the "accept liberally" side of the fence have jumped sides.

psnehanshu 1/19/2026||||

Warnings are ignored. It's much better to fail fast.

wolrah 1/20/2026|||

Warnings only work if the person receiving them is either capable of and motivated to do something about it, or capable of motivating the person/people capable of doing something about it.

A weak warning that's just an entry in a scrolling console means nothing to end users and can be ignored by devs. A strong warning that comes out as a modal dialog can still be ignored by devs and then just annoys users. See the early era of Windows UAC for possibly the most widespread example of a strong warning added after the fact.

ajross 1/19/2026|||

That's true, but sort of misses the spirit of Hyrum's law (which is that the world is filled with obscure edge cases).

In this case the broken resolver was the one in the GNU C Library, hardly an obscure situation!

The news here is sort of buried in the story. Basically Cloudflare just didn't test this. Literally every datacenter in the world was going to fail on this change, probably including their own.

black3r 1/19/2026||

> Literally every datacenter in the world was going to fail on this change

I would expect most datacenters to use their own local recursive caching DNS servers instead of relying on 1.1.1.1 to minimize latency.

stevefan1999 1/20/2026|||

that means you did a leaky abstraction indirectly but it is on the people level

chrisweekly 1/19/2026||

Obligatory xkcd for Hyrum's Law: https://xkcd.com/1172

NelsonMinar 1/19/2026||

It's remarkable that the ordinary DNS lookup function in glibc doesn't work if the records aren't in the right order. It's amazing to me we went 20+ years without that causing more problems. My guess is most people publishing DNS records just sort of knew that the order mattered in practice, maybe figuring it out in early testing.

pixl97 1/19/2026||

I think it's more of a server side ordering, in which there were not that many DNS servers out there, and the ones that didn't keep it in order quickly changed the behavior because of interop.

CNAMES are a huge pain in the ass (as noted by DJB https://cr.yp.to/djbdns/notes.html)

silverwind 1/19/2026|||

It's more likely because the internet runs on a very small number of authorative server implementations which all implement this ordering quirk.

immibis 1/19/2026||

This is a recursive resolver quirk

zinekeller 1/20/2026||

... that was perpetuated by BIND.

(Yes, there are other recursive resolver implementations, but they look at BIND as the reference implementation and absent any contravention to the RFC or intentional design-level decisions, they would follow BIND's mechanism.)

account42 1/20/2026||

It's also the most natural way to structure the answer:

Hey, where can I find A.

Answer: A is actually B

Answer: Also B can be found at 42

skywhopper 1/20/2026|||

It’s not remarkable, because it’s the way all DNS servers work. Order is important in DNS results. It’s why results with multiple A records are returned in shuffled orders: because that impacts how the client interprets the results. Anyone who works with DNS regularly beyond just reading the RFCs ought to recognize this intuitively.

jeroenhd 1/20/2026|||

People probably ran into this all the time, but no single party large enough to have it gain attention produced the failure state.

If a small business or cloud app can't resolve a domain because the domain is doing something different, it's much easier to blame DNS, use another DNS server, and move on. Or maybe just go "some Linuxes can't reach my website, oh well, sucks for the 1-3%".

Cloudflare is large enough that they caused issues for millions of devices all at once, so they had to investigate.

What's unclear to me is if they bothered to send patches to broken open-source DNS resolvers to fix this issue in the future.

iainmerrick 1/20/2026||

No, because they're not really broken. I think this is fairly clear:

Based on what we have learned during this incident, we have reverted the CNAME re-ordering and do not intend to change the order in the future.

To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF.

That is, explicitly documenting the "broken" behaviour as permitted.

fweimer 1/19/2026||

The last time this came up, people said that it was important to filter out unrelated address records in the answer section (with names to which the CNAME chain starting at the question name does not lead). Without the ordering constraint (or a rather low limit on the number of CNAMEs in a response), this needs a robust data structure for looking up DNS names. Most in-process stub resolvers (including the glibc one) do not implement a DNS cache, so they presently do not have a need to implement such a data structure. This is why eliminating the ordering constraint while preserving record filtering is not a simple code change.

Dylan16807 1/20/2026|||

Doesn't it need to go through the CNAME chain no matter what? If it's doing that, isn't filtering at most tracking all the records that matched? That requires a trivial data structure.

Parsing the answer section in a single pass requires more finesse, but does it need fancier data structures than a string to string map? And failing that you can loop upon CNAME. I wouldn't call a depth limit like 20 "a rather low limit on the number of CNAMEs in a response", and max 20 passes through a max 64KB answer section is plenty fast.

fweimer 1/20/2026||

I don't know if the 20 limit is large enough in practice. People do weird things (after migrating from non-DNS naming services, for example). Then there is label compression, so you can have theoretically have several thousand RRs in a single 64 KiB response. These numbers are large enough that a simple multi-pass approach is probably not a good idea.

And practically speaking, none of this CNAME-chain chasing adds any functionality because recursive servers are expected to produce ready-to-use answers.

paulddraper 1/19/2026||

> RFC 1034, published in 1987, defines much of the behavior of the DNS protocol, and should give us an answer on whether the order of CNAME records matters. Section 4.3.1 contains the following text:

> If recursive service is requested and available, the recursive response to a query will be one of the following:

> - The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.

> While "possibly preface" can be interpreted as a requirement for CNAME records to appear before everything else, it does not use normative key words, such as MUST and SHOULD that modern RFCs use to express requirements. This isn’t a flaw in RFC 1034, but simply a result of its age. RFC 2119, which standardized these key words, was published in 1997, 10 years after RFC 1034.

It's pretty clear that CNAME is at the beginning.

The "possibly" does not refer to the order but rather to the presence.

If they are present, they are are first.

urbandw311er 1/19/2026||

The whole world knows this except Cloudflare who actually did know it but are now trying to pretend that they didn’t.

kiwijamo 1/19/2026||

Some people (myself included) read that as "would ideally come first, but it is not neccessary that it comes first". The language is not clear IMHO and could be worded better.

paulddraper 1/21/2026|||

The possibility is "preface by one or more CNAME RRs..."

I.e. the possibly logically applies to the entire phrase, not just a part of it.

- The answer - to the query - possibly - CNAME RRs - prefaced by - one or more - that specify aliases - encountered on the way to an answer

afiori 1/20/2026|||

In my native language the literal translation of possibly has a distinct preferably meaning but I feel that in English it does not.

It might be a victim of polite/ironic/sarcastic influences to language that turns innocuous words into contronyms

linsomniac 1/19/2026||

>While in our interpretation the RFCs do not require CNAMEs to appear in any particular order

That seems like some doubling-down BS to me, since they earlier say "It's ambiguous because it doesn't use MUST or SHOULD, which was introduced a decade after the DNS RFC." The RFC says:

>The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.

How do you get to interpreting that, in the face of "MUST" being defined a decade later, as "I guess I can append the CNAME to the answer?

Holding onto "we still think the RFC allows it" is a problem. The world is a lot better if you can just admit to your mistakes and move on. I try to model this at home and at work, because trying to "language lawyer" your way out of being wrong makes the world a worse place.

skywhopper 1/20/2026|

The RFC is also 39 years old! At this point, DNS is what existing software expects it to be, not what someone proposed in the mid-eighties. The fact that they did not have any testing to match exact byte-by-byte responses with existing behavior and other DNS resolvers for this layer of service is massively irresponsible.

bwblabs 1/19/2026||

I will hijack this post to point out CloudFlare really doesn't understand RFC1034, their DNS authoritative interface only blocks A and AAAA if there is a CNAME defined, e.g. see this:

  $ echo "A AAAA CAA CNAME DS HTTPS LOC MX NS TXT" | sed -r 's/ /\n/g' | sed -r 's/^/rfc1034.wlbd.nl /g' | xargs dig +norec +noall +question +answer +authority @coco.ns.cloudflare.com
  ;rfc1034.wlbd.nl.  IN A
  rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
  ;rfc1034.wlbd.nl.  IN AAAA
  rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
  ;rfc1034.wlbd.nl.  IN CAA
  rfc1034.wlbd.nl. 300 IN CAA 0 issue "really"
  ;rfc1034.wlbd.nl.  IN CNAME
  rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
  ;rfc1034.wlbd.nl.  IN DS
  rfc1034.wlbd.nl. 300 IN DS 0 13 2 21A21D53B97D44AD49676B9476F312BA3CEDB11DDC3EC8D9C7AC6BAC A84271AE
  ;rfc1034.wlbd.nl.  IN HTTPS
  rfc1034.wlbd.nl. 300 IN HTTPS 1 . alpn="h3"
  ;rfc1034.wlbd.nl.  IN LOC
  rfc1034.wlbd.nl. 300 IN LOC 0 0 0.000 N 0 0 0.000 E 0.00m 0.00m 0.00m 0.00m
  ;rfc1034.wlbd.nl.  IN MX
  rfc1034.wlbd.nl. 300 IN MX 0 .
  ;rfc1034.wlbd.nl.  IN NS
  rfc1034.wlbd.nl. 300 IN NS rfc1034.wlbd.nl.
  ;rfc1034.wlbd.nl.  IN TXT
  rfc1034.wlbd.nl. 300 IN TXT "Check my cool label serving TXT and a CNAME, in violation with RFC1034"

The result is DNS resolvers (including CloudFlare Public DNS) will have a cache dependent result if you query e.g. a TXT record (depending if it has the CNAME cached). At internet.nl (https://github.com/internetstandards/) we found out because some people claimed to have some TXT DMARC record, while also CNAMEing this record (which results in cache dependent results, and since internet.nl uses RFC 9156 QName Minimisation, if first resolves A, and therefor caches the CNAME and will never see the TXT). People configure things similar to https://mxtoolbox.com/dmarc/dmarc-setup-cname instructions (which I find in conflict with RFC1034).

ZoneZealot 1/19/2026|

> People configure things similar to https://mxtoolbox.com/dmarc/dmarc-setup-cname instructions (which I find in conflict with RFC1034).

I don't think they're advising anyone create both a CNAME and TXT at the same label - but it certainly looks like that from the weird screenshot at step 5 (which doesn't match the text).

I think it's mistakenly a mish-mash of two different guides, one for 'how to use a CNAME to point to a third party DMARC service entirely' and one for 'how to host the DMARC record yourself' (irrespective of where the RUA goes).

bwblabs 1/19/2026||

I'm not sure, but we're seeing this specifically with _dmarc CNAMEing to '.hosted.dmarc-report.com' together with a TXT record type, also see this discussion users asking for this at deSEC: https://talk.desec.io/t/cannot-create-cname-and-txt-record-f...

My main point was however that it's really not okay that CloudFlare allows setting up other record types (e.g. TXT, but basically any) next to a CNAME.

ycombiredd 1/20/2026||

Yes. This type of behavior was what I was referring to in an earlier comment mentioning flashbacks to seeing logs from named filled with "cannot have cname and other data", and slapping my forehead asking "who keeps doing this?", in the days when editing files by hand was the norm. And then, of course having repeats of this feeling as tools were built, automations became increasingly common, and large service providers "standardized" interfaces (ostensibly to ensure correctness) allowing or even encouraging creation of bad zone configurations.

The more things change, the more things stay the same. :-)

forinti 1/19/2026||

> While in our interpretation the RFCs do not require CNAMEs to appear in any particular order, it’s clear that at least some widely-deployed DNS clients rely on it. As some systems using these clients might be updated infrequently, or never updated at all, we believe it’s best to require CNAME records to appear in-order before any other records.

That's the only reasonable conclusion, really.

hdjrudni 1/19/2026|

And I'm glad they came to it. Even if everyone else is wrong (I'm not saying they are) sometimes you just have to play along.

WorldMaker 1/20/2026||

Hopefully Cloudflare documenting the expected behavior and that possibly getting standards tracked will make things easier for the next RFC readers.

seiferteric 1/19/2026||

Now that I have seemingly taken on managing DNS at my current company I have seen several inadequacies of DNS that I was not aware of before. Main one being that if an upstream DNS server returns SERVFAIL, there is no distinction really between if the server you are querying is failed, or the actual authoritative server upstream is broken (I am aware of EDEs but doesn't really solve this). So clients querying a broken domain will retry each of their configured DNS servers, and our caching layer (Unbound) will also retry each of their upstreams etc... Results in a bunch of pointless upstream queries like an amplification attack. Also have issue with the search path doing stupid queries with NXDOMAIN like badname.company.com, badname.company.othername.com... etc..

simoncion 1/20/2026||

> So clients querying a broken domain will retry each of their configured DNS servers, our caching layer (Unbound) will also retry each of their upstreams etc...

I expect this is why BIND 9 has the 'servfail-ttl' option. [0]

Turns out that there's a standards-track RFC from 1998 that explicitly permits caching SERVFAIL responses. [1] Section 8 of that document suggests that this behavior was permitted by RFC 1034 (published back in 1987).

[0] <https://bind9.readthedocs.io/en/v9.18.42/reference.html#name...>

[1] <https://www.rfc-editor.org/rfc/rfc2308#section-7.1>

JackSlateur 1/24/2026|||

DNS search is stupid by itself and shall be avoided everywhere

For you sanity, only deal with FQDNs;

indigodaddy 1/19/2026||

re: your SERVFAIL observation, oh man did I run into this exact issue about a year or so ago when this came up for a particular zone. all I was doing was troubleshooting it on the caching server. Took me a day or two to actually look at the auth server and find out that the issue actually rooted from there.

mdavid626 1/19/2026|

I would expect, that dns servers like 1.1.1.1 at this scale have integration tests running real resolvers, like the one in glibc. How come this issue was discovered only in production?

t0mas88 1/19/2026||

This case would only happen if a CNAME chain first expired from the cache in the wrong order and then subsequently was queried via glibc. Theirs tests may test both that glibc resolving works and that re-querying expired records works, but not the combination of the two.

mdavid626 1/20/2026||

I’d test such scenarios as well. Run many real glibc resolvers for a while. Sooner or later caching issue would surface.

tcdent 1/20/2026||

Agreed. Seems like a pretty risky optimization that fundamentally changed behavior; like it or not the ordering of vectors is often part of the data structure.

Could have just used a prepend to preserve behavior instead pf going down the rabbit hole of re-interpreting the RFC (which is a cop out IMO; it worked before, a change broke it).

More comments...