YAML: The Norway Problem (2022)

Posted by carlos-menezes 2 days ago

YAML: The Norway Problem (2022)(www.bram.us)

219 points | 163 comments

azernik 2 days ago|

Even worse is the all-decimal MAC problem.

Some genius decided that, to make time input convenient, YAML would parse HH:MM:SS as SS + 60×MM + 60×60×HH. So you could enter 1:23:45 and it would give you the correct number of seconds in 1 hour, 23 minutes, and 45 seconds.

They neglected to put a maximum on the number of such sexagesimal places, so if you put, say, six numbers separated by colons like this, it would be parsed as a very large integer.

Imagine my surprise when, while working at a networking company, we had some devices which failed to configure their MAC addresses in YAML! After this YAML config file had been working for literal years! (I believe this was via netplan? It's been like a decade, I don't remember.)

Turns out, if an unquoted MAC address had even a single non-decimal hex digit, it would do what we expected (parse as a string). This is not only by FAR the more common case, but also we had an A in our vendor prefix, so we never ran into this "feature" during initial development.

Then one day we ran out of MAC addresses and got a new vendor prefix. This time it didn't have any letters in it. Hilarity ensued.

(This behavior has thankfully been removed in more recent YAML standards.)

weinzierl 2 days ago||

Perl has a Poland Problem. The customary file extension for Perl files is *.pl. This worked well until Apache introduced content negotiation and the convention to add a language code as file extension. It had index.html.en, index.html.de, for example.

index.html.pl is where the problem started and the reason why the officially recommended file extension for Perl files used to be (still is?) *.plx.

I don't have the Camel book at hand, but Randal Schwartz's Learning Perl 5th edition says:

"Perl doesn't require any special kind of filename or extension, and it's better not to use an extension at all. But some systems may require an extension like plx (meaning PerL eXecutable); see your system's release notes for more information."

dtech 2 days ago||

That sounds more like an Apache problem than a Perl problem. It's their mistake and it's not even relevant outside Apache context

maxloh 2 days ago|||

That should be marked as a breaking change on Apache side IMO. It would be a security nightmare if server code were leaked to public.

weinzierl 2 days ago|||

It should have been an Apache problem, yes. Not only did it turn out that at least the language negotiation part of content negotiation wasn't the best idea but the way Apache handled it was problematic apart from the pl problem. In the end the Perl community took the issue upon them, so historically I'd say it was a Perl problem (of choice).

ginko 2 days ago||

Also, Prolog has the Perl problem. :)

weinzierl 1 day ago||

Yes, I think Prolog had pl even before Perl came around, but Perl snatched it away,

gnabgib 2 days ago||

The YAML document from hell (566 points, 2023, 353 comments) https://news.ycombinator.com/item?id=34351503

That's a Lot of YAML (429 points, 2023, 478 comments) https://news.ycombinator.com/item?id=37687060

No YAML (Same as above) (152 points, 2021, 149 comments) https://news.ycombinator.com/item?id=29019361

mdaniel 2 days ago|

And some light commentary a few days ago: https://news.ycombinator.com/item?id=43648263 - Apr 2025 (51 comments)

pkkm 2 days ago||

Programming with string templates, in a highly complex and footgun-rich markup language, is one of the things I find most offputting about the DevOps ecosystem.

sph 2 days ago||

I believe Satan itself decided to mix YAML, Jinja and Turing-complete logic when it created Ansible. It truly is the sendmail of the modern era.

senderista 1 day ago|||

Several years ago when I was writing a deployment system for a cloud distributed database, I tried to automate everything with Ansible playbooks and the Ansible "API" (LOL). I pretty quickly gave up on implementing anything but the most trivial logic in templated YAML and switched to Python (wrapping maximally-dumb Ansible playbooks) for everything nontrivial.

Fizzadar 1 day ago|||

You might like pyinfra.

mdaniel 1 day ago||

Just about every time someone complains about ansible, there's a comment to plug this project but pyinfra seems to opt-out of the cloud provisioning part, instead delegating to its terraform connector, which drags in all the nonsense that entails. That makes it not only less useful but (IMHO) a horrible name for a project that only does "remote execution" and not infrastructure. The fact that it's even missing @aws @azure @gcp connectors further solidifies "who is the audience for this thing?"

sph 1 day ago|||

Not everyone runs cloud servers. pyinfra seems to fit my needs like a glove, so I guess I am the intended audience.

I never liked the provisioning overlap Ansible has with Terraform, so it makes sense to me: provisioning servers with tf, configure them with another tool, whether it’s ansible or pyinfra. Well, at least in theory

sofixa 1 day ago|||

> Just about every time someone complains about ansible, there's a comment to plug this project but pyinfra seems to opt-out of the cloud provisioning part

Which Ansible is absolutely atrocious at, so that makes sense. Use the best tool for the job (so Terraform, maybe Pulumi/tfcdk if you hate your future self/future teammates for infra.

nicktelford 1 day ago||

This is why I generally use Terraform for Kubernetes. It's not perfect, but it's miles better than the various different YAML-templating solutions (Kustomize, Helm) popular in the Kubernetes ecosystem.

mdaniel 1 day ago||

Two different stateful recordkeeping control planes with disparate opinions about the current state of the world. What can go wrong.

sofixa 1 day ago||

To be fair one of the most common ways of managing Kubernetes clusters and what is deployed on them is to use ArgoCD, which gives you the same issue of dual stateful control planes.

Even more fun is if you then run your Kubernetes cluster on top of a VM orchestrator such as vSphere, that way you have multiple layers of stateful control planes and compute orchestrators fighting each other.

anvandare 2 days ago||

"The limits of my keyboard mean the limits of my programming language."

If only they had had ⊥ and ⊤ somewhere on their keys to work with Booleans directly while designing the languages. In another branch of history, perchance.[1]

[1] https://en.wikipedia.org/wiki/APL_(programming_language)#/me...

tossandthrow 2 days ago||

⊥ and ⊤ is not entirely congruent to false and true.

Boolean and propositional logic is not the same.

Q6T46nT668w6i3m 1 day ago||

For _ordinary_ two‑valued classical propositional logic, e.g., YAML, they are congruent.

rusk 1 day ago||

I have an emacs macro for this

alkonaut 1 day ago||

Always quote all yaml strings. If you have a yaml file that has something that isn't a simple value (number, boolean) such as for example a date, time, ip-address, mac address, country code, phone number, server name, configuration name, etc. etc. then you are asking for trouble. Just DON'T DO THAT. It's pretty simple.

"Yeah but it's so convenient"

"Yeah but the benefit of yaml is that you don't need quotes everywhere so that it's more human readable"

DON'T

ohgr 1 day ago|

Yeah that.

00,01,02,03,04,05,06,07,OH SHIT

whacko_quacko 2 days ago||

Pandas has a Nigeria problem, where NA -> NaN.

It's not that bad, because you can explicitly turn that behavior off, but ask me how I know =(

orangewindies 2 days ago||

That's a Namibia problem, Nigeria is NG.

whacko_quacko 23 hours ago||

Damn, you're right ^^' Thanks for pointing that out

trueismywork 2 days ago||

How?

whacko_quacko 22 hours ago||

For example when reading a CSV. Try using `read_csv` on a file that contains two letter country codes including NA.

ashishb 2 days ago||

How often do people even encounter this issue? I have been using YAML for 5+ years and have never had it before. Further, I use `yamllint` which points this out as a lint issue "truthy value should be one of [false, true]".

tetha 2 days ago||

I don't recall encountering the norway problem in the wild.

Ansible has a pretty common issue with file permissions, because pretty much every numeric representation of a file mode is a valid number in YAML - and most of them are not what you want.

Sure, we can open up a whole 'nother can of worms if we should be programming infrastructure provisioning in YAML, but it's what we have. Chef with Ruby had much more severe issues once people started to abuse it.

Plus, ansible-lint flags that reliably.

hinkley 2 days ago|||

Fractions are discriminatory when they happen to one individual or group every time or even just the first time.

See also p95 but the same couple of users always see the p99 time, due to some bug.

ashishb 1 day ago||

Indeed, based on the comments, it is a scissor-bug. Most people never encountered it while some encountered it a lot.

rat87 2 days ago|||

I have when getting an openapi yaml file from someone else.

jeltz 2 days ago|||

I have seen it twice but I work in Sweden where we often do things also for the Norwegian market.

speedgoose 2 days ago|||

I have encountered it once, though I live in Norway and worked in IT there for a decade.

mongol 2 days ago|||

Has been encountered where I work. A global website with lots of country-specific config.

Y-bar 2 days ago|||

Never experienced it for the past 10+ years since the bug was fixed in the spec.

Y_Y 1 day ago|||

I don't think false is truthy.

peanut-walrus 2 days ago||

I for one did encounter exactly this problem when configuring a list of countries via ansible for geoip whitelisting.

xelxebar 2 days ago||

This has been fixed since 2009 with YAML 1.2. The problem is that everyone uses libyaml (_e.g._ PyYAML _etc._) which is stuck on 1.1 for reasons.

The 1.2 spec just treats all scalar types as opaque strings, along with a configurable mechanism[0] for auto-converting non-quoted scalars if you so please.

As such, I really don't quite grok why upstream libraries haven't moved to YAML 1.2. Would love to hear details from anyone with more info.

[0]:https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas

xigoi 2 days ago||

I’m sad that the “fix” was to disallow “no” as a more readable alternative to “false”, rather than to disallow unquoted strings.

mckn1ght 2 days ago|||

It’s silly to have so many keyword synonyms as specified in that earlier regex. I’m also glad we can’t specify numeric literals as roman numerals. KISS

xigoi 2 days ago||

Honestly I’d prefer if “yes” and “no” were the only ways to spell the boolean values. They make sense in pretty much all contexts where booleans are used, whereas “true” and “false” rarely make sense.

dtech 2 days ago|||

Boolean algebra with true and false was well established decades before computers were invented

xigoi 1 day ago|||

Boolean algebra deals with logical propositions, not with configuration. The true/false terminology makes sense there.

tacker2000 2 days ago|||

In boolean logic true/false is ubiquitious and well known. As you can see, if one tries to be cute with it, one will get all sorts of issues. So at this point it doesnt make sense to use anything else.

xigoi 2 days ago||

The true/false terminology makes sense in boolean logic because you’re dealing with the truth of propositions. However, it does not make sense in the context of a configuration language, where there are no propositions that could be true or false.

umanwizard 1 day ago|||

It makes sense in the context of a configuration language because virtually 100% of programmers and other technical computer users understand “true” and “false” as the canonical Boolean values, and as far as I know that has always been the case. It never would have made sense to invent different unfamiliar terms like “yes” and “no” because of some niche philosophical distinction between “Boolean logic” and “configuration” that almost nobody in the real world cares about.

xigoi 1 day ago||

“yes” and “no” are “unfamiliar terms”? What the fuck? Everyone who knows even the basics of English knows what these words mean.

umanwizard 1 day ago||

They are familiar as English words, yes, but unfamiliar as terms of art for Boolean values in computing. It’d be like replacing “if” statements with “whenever” statements.

mckn1ght 1 day ago||

Don’t give them any ideas! They already tried to make inroads with ruby’s “unless”.

stevage 2 days ago|||

Huh, I never considered this. we take true and false for granted everywhere but they aren't the most meaningful.

xelxebar 2 days ago||||

The fix is to make conversion user-controllable. If you want to disallow bare scalars except for booleans and numbers or whatever, it's just a little bit of configuration away.

heavenlyblue 2 days ago||||

Why do you need an alternative spelling of false?

xigoi 2 days ago||

`logging: no` clearly says “I do not want logging”. `logging: false` is less explicit – what exactly is false?

jeltz 2 days ago|||

Then it should be on/off, not yes/no.

xigoi 1 day ago||

on/off also doesn’t make sense in many contexts, for example `isRegistered: on`.

qw 1 day ago|||

I often prefer enums over booleans for this. It seems more readable for most cases, and can be extended with new values.

This:

    isRegistered: true

could be replaced with

    accountStatus: "UNREGISTRED"

alkonaut 1 day ago||||

Logging: no could also be log in norwegian. Or log only for the norwegian region. That's the thing with too many keywords and optional quoting, you can't know.

And for this reason, "logging: false" would be clearer than "logging: no" to represent "I do not want logging".

xigoi 1 day ago||

`false` could be a code for something else just as well as `no`. For example, it could mean that I only want to see logs of false information appearing in the system. The only proper solution is to require quotes around strings.

qznc 2 days ago||||

Logging: ignore/print/file

Don’t use bool at all.

xigoi 1 day ago||

This, along with number formats, could be a good argument for strings being the only primitive type in config languages.

qznc 1 day ago||

I recently learned about NestedText: https://nestedtext.org/

While it has the YAML-like significant whitespace, it looks nice because it doesn't try to be clever.

paulddraper 1 day ago|||

Options are either

1. Specify in the key

  loggingEnabled: false

2. Specify in the value:

  logging: disabled

pydry 2 days ago|||

Yeah, you still get the same issue that 3 is an integer, 3.3 is a float and 3.3.3 is a string.

maxloh 2 days ago|||

Why wasn't that a major version bump, like YAML 2.0?

That sounds like a breaking change that rendered old YAML documents to be parsed differently.

transfire 2 days ago||

Absolutely correct! Please correct me if I am wrong, but as far as I know, no one has implemented YAML completely according to spec.

The tag schema used is supposed to be modifiable folks!

And why anyone would still be using 1.1 at this point is just forehead palming foolishness.

xelxebar 1 day ago||

AFAIK, libfyaml[0] (not to be confused with libyaml) passes the 1.2 test suite, and if I remember correctly, it's currently the only YAML loader with that claim at the moment.

The yaml.orf website also lists a bunch of implementations, with about 1/3 supporting 1.2. I'm guessing that the users of those libraries just happily hum along and we never hear from them!

The issue is that downstream consumers of popular languages with a vocal community here on HN tend to just pull in libyaml, PyYAML being the major offender in my mind.

[0]:https://github.com/pantoniou/libfyaml

quechimba 2 days ago|

We had this issue many years ago when people from Norway couldn't sign up. Took us a while to figure out

dmckeon 2 days ago||

Narrow escape for people from Yemen (YE).

magicalhippo 2 days ago|||

As a Norwegian I'm very curious, where in the pipeline were you using YAML? And why?

I've only seen it used for configuration.

StableAlkyne 2 days ago|||

I've seen teams use it as a replacement for JSON because it has the perception of being more "modern"

bornfreddy 1 day ago|||

While JSON is annoying because it lacks some pretty basic features (comments, trailing comma), at least its spec is short. YAML is huuuge - there are way too many ways to do the same thing.

quechimba 1 day ago||||

We were using a fork of https://github.com/carmen-ruby/carmen/tree/master/iso_data/b... with our own translations. We used the data in the signup form.

tough 2 days ago|||

usually locale's paths gone wrong

TZubiri 2 days ago|||

I usually think of yaml for internal config files, would never think of yaml for user data.

Don't ask me why though, might have something to do with how it's written like a python file, no user would want to write their data in yaml format.

nurgasemetey 2 days ago||

Probably, OP didn't keep user data in YAML, but I think there was config that kept allowed countries to sign up.

duxup 2 days ago||

Or were they from Noway ...

More comments...