Understanding Round Robin DNS

Posted by hyperknot 21 hours ago

Understanding Round Robin DNS(blog.hyperknot.com)

302 points | 104 commentspage 2

freitasm 16 hours ago|

Interesting. The author starts by discussing DNS round robin but then briefly touches on Cloudflare Load Balancing.

I use this feature, and there are options to control Affinity, Geolocation and others. I don't see this discussed in the article, so I'm not sure why Cloudflare load balancing is mentioned if the author does not test the whole thing.

Their Cloudflare wishlist includes "Offline servers should be detected."

This is also interesting because when creating a Cloudflare load balancing configuration, you create monitors, and if one is down, Cloudflare will automatically switch to other origin servers.

These screenshots show what I see on my Load Balancing configuration options:

https://cdn.geekzone.co.nz/imagessubs/62250c035c074a1ee6e986...

https://cdn.geekzone.co.nz/imagessubs/04654d4cdda2d6d1976f86...

hyperknot 15 hours ago|

I briefly mention that I don't go into L7 Load Balancing because it'd be cost prohibitive for my use case (millions of requests).

Also, the article is about DNS-RR, not the L7 solution.

hypeatei 19 hours ago||

The browser behavior is really nice, good to know that it falls back quickly and smoothly. Round robin DNS has always been referred to as a "poor mans load balancer" which it seems to be living up to.

> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.

This took two tries for me, which begs the question how curl is keeping track of RTT (round trip times), interesting.

edm0nd 18 hours ago||

The dark remix version of this is fast flux hosting and what a lot of the bulletproof hosting providers use.

https://unit42.paloaltonetworks.com/fast-flux-101/

nielsole 17 hours ago||

> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.

I always assumed curl was stateless between invocations. What's going on here?

barrkel 17 hours ago|

My hypothesis: he's running on macOS and he's seeing the same behavior from Safari as from curl because they're both using OS-provided name resolution which is doing the lowest-latency selection.

Firefox and Chrome use DNS over HTTPS by default I believe, which may mean they use a different name resolution path.

The above is entirely conjection on my part, but the guess is heavily informed by the surprise of curl's behavior.

plagiat0r 9 hours ago|||

But this does not make sense. How Mac operating system resolver are supposed to test the latency of (A)ddress records? Browser use this network address to actually make a tcp connection on 443 and measure latency here. Or udp/443 when using http3/quic.

But operating system resolver only speak with DNS servers. It does not make https connections to calculate latency which would pick "the closest server". Also dns had no way to tell what port you will be using, maybe service is on 8443 or something.

For geo DNS I've built a custom backed for powerdns with geo DNS capabilities and healthckecks to quickly remove a broken vps from the DNS responses.

barrkel 4 hours ago||

If I had to hypothesize further, I'd say that macOS may let its DNS resolver cache interact with its TCP stack. It's not inconceivable that the TCP handshake is used to make a rough estimate of network latency.

hyperknot 15 hours ago|||

Correct. I'm on macOS and I tried turning off DoH in Firefox and then it worked like Safari.

mlhpdx 17 hours ago||

Interesting topic for me, and I’ve been looking at anycast IP services and latency based DNS resolvers as well. I even made a repo[1] for anyone interested in a quick start for setting up AWS global accelerator.

[1] https://github.com/mlhpdx/cloudformation-examples/tree/maste...

why-el 17 hours ago||

Hm, I thought Happy Eyeballs (HE) was mainly concerned with IPv6 issues and falling back to IPV4. I didn't think it was this RFC in which finally some words were said about round-robin specifically, but it looks like it was (from this article).

Is it true then that before HE, most round-robin implementations simply cycled and no one considered latency? That's a very surprising finding.

backtoyoujim 11 hours ago||

"I wrote a decoder in Perl. Everything must be in Perl."

preach on.

zamalek 20 hours ago||

Take a look at SRV records instead - they are very intentionally designed for this, and behave vaguely similarly to MX. Creating a DNS server (or a CoreDNS/whatever module) that dynamically updates weights based on backend metrics has been a pending pet project of mine for some time now.

jeroenhd 18 hours ago|

Until the HTTP spec gets updated to include SRV records, using SRV records for HTTP(S) is technically spec-incompliant and practically useless.

However, as is common with web tech, the old SRV record has been reinvented as the SVCB record with a smidge of DANE for good measure.

bar000n 15 hours ago||

hey! so i got a cdn for video made of 4 bare metals and 2 are newer and more powerful so i give them each 2 ip addresses from the 6 addresses replied by dns for the respective a record. but from a very diverse pool of devices (proprietary set top boxes, smart tv sets, mobile clients ios and android, web browsers, etc) i still get ~40% of traffic on the older servers instead of the expected 33% given 2 out of 6 ip addresses resolved as dns a records for these hosts. why?

jkrauska 17 hours ago|

Check out what happens when you use IPv6 addresses. RFC 6724 is awkward about ordering with IPv6.

How your OS sorts DNS responses also comes in to play. Depends on what your browser makes DNS requests.

More comments...