Posted by linolevan 3 hours ago
> The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.
The "possibly preface" (sic!) to me is obviously to be understood as "if there are any CNAME RRs, the answer to the query is to be prefaced by those CNAME RRs" and not "you can preface the query with the CNAME RRs or you can place them wherever you want".
But also.. the programmers working on the software running one of the most important (end-user) DNS servers in the world:
1. Changes logic in how CNAME responses are formed
2. I assume some tests at least broke that meant they needed to be "fixed up" (y'know - "when a CNAME is queried, I expect this response")
3. No one saw these changes in test behavoir and thought "I wonder if this order is important". Or "We should research more into this", Or "Are other DNS servers changing order", Or "This should be flagged for a very gradual release".
4. Ends up in test environment for, what, a month.. nothing using getaddrinfo from glibc is being used to test this environment or anyone noticed that it was broken
Cloudflare seem to be getting into thr swing of breaking things and then being transparent. But this really reads as a fun "did you know", not a "we broke things again - please still use us".
There's no real RCA except to blame an RFC - but honestly, for a large-scale operation like there's this seems very big to slip through the cracks.
I would make a joke about South Park's oil "I'm sorry".. but they don't even seem to be
Which goes to show, one person’s “obvious understanding” is another’s “did they even read the entire document”.
All of which also serves to highlight the value of normative language, but that came later.
And perhaps this is somewhat pedantic, but they also write that “RFC 1034 section 3.6 defines Resource Record Sets (RRsets) as collections of records with the same name, type, and class.” But looking at the RFC, it never defines such a term; it does say that within a “set” of RRs “associated with a particular name” the order doesn’t matter. But even if the RFC had said “associated with a particular combination of name, type, and class”, I don’t see how that could have introduced ambiguity. It specifies an exception to a general rule, so obviously if the exception doesn’t apply, then the general rule must be followed.
Anyway, Cloudflare probably know their DNS better than I do, but I did not find the article especially persuasive; I think the ambiguity is actually just a misreading, and that the RFC does require a particular ordering of CNAME records.
(ETA:) Although admittedly, while the RFC does say that CNAMEs must come before As in the answer, I don’t necessarily see any clear rule about how CNAME chains must be ordered; the RFC just says “Domain names in RRs which point at another name should always point at the primary name and not the alias ... Of course, by the robustness principle, domain software should not fail when presented with CNAME chains or loops; CNAME chains should be followed”. So actually I guess I do agree that there is some ambiguity about the responses containing CNAME chains.
I just commented the same.
It's pretty clear that the "possibly" refers to the presence of the CNAME RRs, not the ordering.
"With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."
combined with failure to follow Postel's Law:
"Be conservative in what you send, be liberal in what you accept."
What's reasonable is: "Set reserved fields to 0 when writing and ignore them when reading." (I heard that was the original example). Or "Ignore unknown JSON keys" as a modern equivalent.
What's harmful is: Accept an ill defined superset of the valid syntax and interpret it in undocumented ways.
Through the appropriate channels; in-band and out-of-band.
In this case the broken resolver was the one in the GNU C Library, hardly an obscure situation!
The news here is sort of buried in the story. Basically Cloudflare just didn't test this. Literally every datacenter in the world was going to fail on this change, probably including their own.
CNAMES are a huge pain in the ass (as noted by DJB https://cr.yp.to/djbdns/notes.html)
Reminds me of https://news.ycombinator.com/item?id=37962674 or see https://tech.tiq.cc/2016/01/why-you-shouldnt-use-cloudflare/
That's the only reasonable conclusion, really.
Also no, the client doesn't need more memory to parse the out-of-order response, it can take multiple passes through the kilobyte.
It’s always DNS.