Top
Best
New

Posted by carlos-menezes 4/12/2025

YAML: The Norway Problem (2022)(www.bram.us)
222 points | 164 commentspage 2
xelxebar 4/13/2025|
This has been fixed since 2009 with YAML 1.2. The problem is that everyone uses libyaml (_e.g._ PyYAML _etc._) which is stuck on 1.1 for reasons.

The 1.2 spec just treats all scalar types as opaque strings, along with a configurable mechanism[0] for auto-converting non-quoted scalars if you so please.

As such, I really don't quite grok why upstream libraries haven't moved to YAML 1.2. Would love to hear details from anyone with more info.

[0]:https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas

xigoi 4/13/2025||
I’m sad that the “fix” was to disallow “no” as a more readable alternative to “false”, rather than to disallow unquoted strings.
mckn1ght 4/13/2025|||
It’s silly to have so many keyword synonyms as specified in that earlier regex. I’m also glad we can’t specify numeric literals as roman numerals. KISS
xigoi 4/13/2025||
Honestly I’d prefer if “yes” and “no” were the only ways to spell the boolean values. They make sense in pretty much all contexts where booleans are used, whereas “true” and “false” rarely make sense.
dtech 4/13/2025|||
Boolean algebra with true and false was well established decades before computers were invented
xigoi 4/13/2025|||
Boolean algebra deals with logical propositions, not with configuration. The true/false terminology makes sense there.
tacker2000 4/13/2025|||
In boolean logic true/false is ubiquitious and well known. As you can see, if one tries to be cute with it, one will get all sorts of issues. So at this point it doesnt make sense to use anything else.
xigoi 4/13/2025||
The true/false terminology makes sense in boolean logic because you’re dealing with the truth of propositions. However, it does not make sense in the context of a configuration language, where there are no propositions that could be true or false.
umanwizard 4/13/2025|||
It makes sense in the context of a configuration language because virtually 100% of programmers and other technical computer users understand “true” and “false” as the canonical Boolean values, and as far as I know that has always been the case. It never would have made sense to invent different unfamiliar terms like “yes” and “no” because of some niche philosophical distinction between “Boolean logic” and “configuration” that almost nobody in the real world cares about.
xigoi 4/13/2025||
“yes” and “no” are “unfamiliar terms”? What the fuck? Everyone who knows even the basics of English knows what these words mean.
umanwizard 4/13/2025||
They are familiar as English words, yes, but unfamiliar as terms of art for Boolean values in computing. It’d be like replacing “if” statements with “whenever” statements.
mckn1ght 4/13/2025||
Don’t give them any ideas! They already tried to make inroads with ruby’s “unless”.
stevage 4/13/2025|||
Huh, I never considered this. we take true and false for granted everywhere but they aren't the most meaningful.
xelxebar 4/13/2025||||
The fix is to make conversion user-controllable. If you want to disallow bare scalars except for booleans and numbers or whatever, it's just a little bit of configuration away.
heavenlyblue 4/13/2025||||
Why do you need an alternative spelling of false?
xigoi 4/13/2025||
`logging: no` clearly says “I do not want logging”. `logging: false` is less explicit – what exactly is false?
jeltz 4/13/2025|||
Then it should be on/off, not yes/no.
xigoi 4/13/2025||
on/off also doesn’t make sense in many contexts, for example `isRegistered: on`.
qw 4/13/2025|||
I often prefer enums over booleans for this. It seems more readable for most cases, and can be extended with new values.

This:

    isRegistered: true
could be replaced with

    accountStatus: "UNREGISTRED"
alkonaut 4/13/2025||||
Logging: no could also be log in norwegian. Or log only for the norwegian region. That's the thing with too many keywords and optional quoting, you can't know.

And for this reason, "logging: false" would be clearer than "logging: no" to represent "I do not want logging".

xigoi 4/13/2025||
`false` could be a code for something else just as well as `no`. For example, it could mean that I only want to see logs of false information appearing in the system. The only proper solution is to require quotes around strings.
paulddraper 4/13/2025||||
Options are either

1. Specify in the key

  loggingEnabled: false
2. Specify in the value:

  logging: disabled
qznc 4/13/2025|||
Logging: ignore/print/file

Don’t use bool at all.

xigoi 4/13/2025||
This, along with number formats, could be a good argument for strings being the only primitive type in config languages.
qznc 4/13/2025||
I recently learned about NestedText: https://nestedtext.org/

While it has the YAML-like significant whitespace, it looks nice because it doesn't try to be clever.

pydry 4/13/2025|||
Yeah, you still get the same issue that 3 is an integer, 3.3 is a float and 3.3.3 is a string.
maxloh 4/13/2025|||
Why wasn't that a major version bump, like YAML 2.0?

That sounds like a breaking change that rendered old YAML documents to be parsed differently.

transfire 4/13/2025||
Absolutely correct! Please correct me if I am wrong, but as far as I know, no one has implemented YAML completely according to spec.

The tag schema used is supposed to be modifiable folks!

And why anyone would still be using 1.1 at this point is just forehead palming foolishness.

xelxebar 4/14/2025||
AFAIK, libfyaml[0] (not to be confused with libyaml) passes the 1.2 test suite, and if I remember correctly, it's currently the only YAML loader with that claim at the moment.

The yaml.orf website also lists a bunch of implementations, with about 1/3 supporting 1.2. I'm guessing that the users of those libraries just happily hum along and we never hear from them!

The issue is that downstream consumers of popular languages with a vocal community here on HN tend to just pull in libyaml, PyYAML being the major offender in my mind.

[0]:https://github.com/pantoniou/libfyaml

nnurmanov 4/13/2025||
Another solution is to change the country name:)
gunalx 4/13/2025||
No way.
hinkley 4/13/2025|||
Or in New Zealand: Nor way.
rwoerz 4/13/2025|||
Neitherway
gunalx 4/13/2025||
Thoug we have renamed amino acidsvi think it was. Because microsoft excel switched the original names to months.
madcaptenor 4/13/2025||
Genes, not amino acids.

https://www.theverge.com/2020/8/6/21355674/human-genes-renam...

gunalx 4/14/2025||
Thanks for corrections.
hgomersall 4/13/2025||
IMO the proposed solution of StrictYAML + schema is the right one here and what we use extensively for human readable configs. StrictYAML (linked to in the post) is essentially a string-type-only restriction of YAML, so you impose your type coercion on the parsed data structure.
vander_elst 4/13/2025|
If you have a schema, why not using directly something like protobufs?
LelouBil 4/13/2025||
The comment you replied to talks about human readable configs
Mesopropithecus 4/13/2025||
Like this? https://protobuf.dev/reference/protobuf/textformat-spec/
mdaniel 4/13/2025||
https://protobuf.dev/reference/protobuf/textformat-spec/#:~:...

and, setting that aside, the very next paragraph says that this is a legit representation of -2.0 which means something has gone gravely wrong

  value: -
    # change this to 3.14 one day
    2.0
raffraffraff 4/13/2025||
Why not just use quotes all the time for strings?
kergonath 4/13/2025||
Because that’s annoying. YAML is often written and read by humans. If you want a verbose and more regular way to do it, there is always JSON. But JSON is really annoying to deal with for humans, although it is much better than YAML for several applications.
hnlmorg 4/13/2025||
You don’t actually need quotes to define a string in YAML. Eg the following syntax

   User:
     Name: >-
       Bob
     Phone: >-
       01234 56789
     Description:>-
       This is a
       multi line
       description 
That’s both readable and parses your records as strings.

Edit: This stack overflow like provides more details https://stackoverflow.com/questions/3790454/how-do-i-break-a...

mdaniel 4/13/2025||
I can't tell if it's irony or not given the sentiment in this thread, but that is not a declaration of a multiline Description field, that's a field of User named "Description:>-" that happens to be missing its trailing ":"

Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes

1: https://github.com/remarshal-project/remarshal#readme and/or $(brew install remarshal)

hnlmorg 4/13/2025||
> I can't tell if it's irony or not given the sentiment in this thread, but that is not a declaration of a multiline Description field, that's a field of User named "Description:>-" that happens to be missing its trailing ":"

The trailing ‘:’ was there right after the ‘n’.

Examples of this syntax:

https://github.com/lmorg/murex/blob/master/builtins/core/arr...

I do agree it’s a bit of a kludge. But if you want data types and unquoted strings then anything you do to the syntax to denote strings over other data types then becomes a bit of a kludge.

The one good thing about this kludge is it allows for string literals (ie no complicated escaping rules).

> Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes

I don’t really understand what you’re alluding to here.

mdaniel 4/13/2025||
Tell us that you didn't try to use that example without telling us you just eyeballed the post

    $ /usr/local/opt/ansible/libexec/bin/python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read()))' <<YML
      User:
         Name: >-
           Bob
         Phone: >-
           01234 56789
         Description:>-
           This is a
           multi line
           description
    YML
    yaml.scanner.ScannerError: while scanning a simple key
      in "<unicode string>", line 6, column 6:
             Description:>-
    $ gojq --yaml-input . <<YML
          User:
             Name: >-
               Bob
             Phone: >-
               01234 56789
             Description:>-
               This is a
               multi line
               description
    YML
    gojq: invalid yaml: <stdin>:6
        6 |          Description:>-
            ^  could not find expected ':'
That's because, for better or worse, yaml considers that a legitimate key name, just missing its delimiter

    $ gojq --yaml-input . <<YML
          User:
             Name: >-
               Bob
             Phone: >-
               01234 56789
             Description:>-:
               This is a
               multi line
               description
    YML
    {
      "User": {
        "Description:>-": "This is a multi line description",
        "Name": "Bob",
        "Phone": "01234 56789"
      }
    }

This exchange in a thread complaining about the whitespace sensitivity doesn't escape me

As for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive

hnlmorg 4/13/2025||
The issue is the lack of white space between the : and the >, not a missing : at the end. I’m typing this on my phone so the odd syntax error might creep in but the key pointer is the examples I linked to and the block token I’ve described.

Further to that point, none of the example links I’ve shared have the : at the end and I have production code that works using the formatting I’ve described. So you’re flat out wrong there with your assumption that block keys always terminate with :

> As for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive

You wouldn’t write code like that because >- denotes a block and you’re now inlining a string.

I mean I’ve shared links explaining how this works and you’re clearly not reading them.

At the end of the day, I’m not going to argue that >- (and its ilk) solves everything. It clearly doesn’t. If you want to write “minimized” YAML using JSON syntax then you’re far far better off quoting the string.

But if you are writing a string in YAML and either don’t want to deal with quotation marks, or need that string to be a string literal (ie not having to escape things like quotation marks) then my suggestion is an option.

It’s not there as a silver bullet but it is a lesser known feature of YAML. Hence me sharing.

Now go read the links and understand it better. You might genuinely find it useful under some scenarios ;)

mdaniel 4/13/2025||
I enjoy that you're scolding me about 'not reading' after doubling down the accuracy of your initial post, which, yes, I can easily imagine you did from your phone

And yet I brought receipts for my claims, and you just bring "reed the manul, n00b"

hnlmorg 4/13/2025||
Firstly, I didn't say "read the manual", I said "read the links I shared". And that's a pretty reasonable comment to make given I took the time to find examples knowing that I couldn't easily type them out on my phone. And if you bothered to open the links you'd realize they were brief and to the point. I was actually trying to be helpful.

Secondly, your "receipts" were incorrect. Neither of your examples follows the examples I cited, and your second example creates a key named "Description:>-", which is clearly wrong. Hence why ">-" needs to be after the colon.

Here is more examples and evidence of how to use >- and why your "receipts" were also incorrect:

https://go.dev/play/p/1B4ba-dUARq

Here you can clearly see my example:

    Foo: >-
      hello
      world
produces:

    { "Foo": "hello world" }
which is correct.

Whereas your example:

    Bar:>-:
      hello
      world
produces

    { "Bar:\u003e-": "hello world" }
which is incorrect.

----

One final point: I don't understand why you're being so argumentative here. I posted a lesser-known YAML feature in case it helps some people and you've turned it into some kind of pissing match based on bad-faith interpretations of my comments. There was no need for you to do that.

kinow 4/13/2025|||
I guess sometikes it is out of your control. I work on a workflow manager where users specify their workflows with YAML. So there's little we can do to prevent them from writing things like no, n, t in a place it could cause some issue like ij the article.
zelphirkalt 4/13/2025||
Ah, the many places that choose to use YAML for no good reason...
kinow 4/13/2025||
Yeah, can't say much about it as I joined after they had already decided on using YAML.
mystifyingpoi 4/13/2025||
I like that in concept, but 1) literally no one does that (prime example - Kubernetes docs) and 2) it looks much more messy with quotes, when you know that they are unnecessary in 95% of cases.
zelphirkalt 4/13/2025||
Oh, I did that in Ansible stuff. Using quotes for all strings. Exactly because I know what a mess YAML is.
mystifyingpoi 4/14/2025||
Fair enough. I'm tempted to try it. There should be probably a yamllint rule for this...
firesteelrain 4/12/2025||
This problem occurs because pyyaml load() uses the full YAML 1.1 schema. There is another function BaseLoader that will interpret everything as a string which is the workaround that the article suggests. Just another way to achieve it.

It’s a bit of a sore spot in the YAML community as to why PyYAML can’t / won’t support YAML 1.2. It was in maintenance mode for a while. YAML 1.2 also introduced breaking changes.

From a SO comment: “ As long as you're okay with the YAML 1.1 standard, PyYAML is still perfectly fine, secure, etc. If you want to support the YAML 1.2 spec (released in 2009), you can use ruamel.yaml, which started out as a fork of PyYAML. – CrazyChucky Commented Mar 26, 2023 at 20:51”

- https://stackoverflow.com/q/75850232

gschizas 4/13/2025||
I wish that ruamel.yaml had better documentation. I've had to dive into the code so many times to find out how to do something.
rat87 4/13/2025||
Yeah it's a problem I had to put up a PR on a tool I was using because I ran into the Norway problem on yaml I was getting from another team. I did ask them to add quotes just in case
firesteelrain 4/13/2025||
A supplier we contracted with and we gave requirements to asked me what format do we want the export/import of the data to be in and I said JSON. It’s simple, easy and can be converted into anything else very easily
kazinator 4/13/2025||
In Lisp, if you want to read text into symbols (e.g. file of words), you just switch to a dedicated package in which those symbols are interned. Then if NIL happens to come up, it will be a symbol named "NIL" in that package, unrelated to the special object.
dissent 4/13/2025||
I reckon if this is really a big concern for anybody, then they are probably writing way too much YAML to begin with. If you're being caught out by things like this and need to debug it, then it maps very cleanly to types in most high level languages and you can generate your YAML from that instead.
makeitdouble 4/13/2025||
Sadly you usually realize you've been writing too much YAML way past the turning point, and it will be a pain to move a single file to JSON for instance when you have a whole process and system that otherwise ingest YAML, including keeping track of why this specific part of JSON and not YAML.

So people work around the little paper cuts, while still hitting the traps from time to time as they forget them.

> generate YAML

I've a hard time finding a situation where I'd want to do that. Usually YAML is chosen for human readability, but here we're already in a higher level language first. JSON sounds a more appropriate target most of the time ?

charrondev 4/13/2025|||
Isn’t yaml a strict superset of JSON? Any compliant YAML parser should be able to ingest a JSON document.
throwawaymaths 4/13/2025|||
https://metacpan.org/pod/JSON::XS#JSON-and-YAML
charrondev 4/13/2025||
> I have been pressured multiple times by Brian Ingerson (one of the authors of the YAML specification) to remove this paragraph, despite him acknowledging that the actual incompatibilities exist. As I was personally bitten by this "JSON is YAML" lie, I refused and said I will continue to educate people about these issues, so others do not run into the same problem again and again. After this, Brian called me a (quote)complete and worthless idiot(unquote).

> In my opinion, instead of pressuring and insulting people who actually clarify issues with YAML and the wrong statements of some of its proponents, I would kindly suggest reading the JSON spec (which is not that difficult or long) and finally make YAML compatible to it, and educating users about the changes, instead of spreading lies about the real compatibility for many years and trying to silence people who point out that it isn't true.

> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.

Well that’s disappointing.

alabastervlog 4/13/2025||
This explains some things on, like, a mythic level, that I’ve felt about yaml practically since the first time I saw it.

I guess software are human texts after all.

mannykannot 4/13/2025||||
Are there no cases where well-formed JSON could be subject to the problems covered in the article, when parsed by a compliant YAML parser? I'm asking because I know nothing about YAML and not much more about JSON.
charrondev 4/13/2025|||
Not that I know. JSON requires strings to be quoted which is basically the problem here. Of course it’s not a great human writable configuration format (no comments being a huge problem).

I’m just pointing out that it should be very simple to swap a YAML file for a JSON file in any system that accepts YAML

makeitdouble 4/13/2025|||
JSON is stricter than YAML so that class of issues is avoided.
makeitdouble 4/13/2025||||
Yes. Rewriting a YAML file into strict JSON won't have any impact on the ingestion or the processing of it.
dissent 4/13/2025|||
There are probably two use cases.

Configuration files for programs. These tend to be short.

DSLs which are large manifests for things like cloud infrastructure. These tend to be long, they grow over time.

My pet hypothesis is these DSLs exist mostly for neutrality - the vendor can't assume you have Python or something present. But as a user, you can assume just that and gain a lot by authoring in a proper language and generating YAML.

See https://github.com/cloudtools/troposphere for a great example for AWS CloudFormation.

bigstrat2003 4/13/2025|||
> Configuration files for programs. These tend to be short.

This is where I use YAML and it shines there. IMO easier to read and write by hand than JSON, and short sweet config files don't have the various problems people run into with YAML. It's great.

makeitdouble 4/13/2025|||
I can't run the examples right now, but looking at the last "print(template.to_json())" line, looks like the main use case is JSON ?

On cloud infra, yes, having one or two layers of languages is a natural situation. GCP and AWS both accepting (encouraging?) JSON as a subset of YAML makes it a simpler choice when choosing an auto generating target.

You mention people wanting to author the generated files, I think in other situations tweaking the auto-generated files will be seen as riskier with potential overwriting issues, so lower readability will be seen as a positive.

dissent 4/13/2025||
That's the point really, you can generate JSON or YAML and it doesn't really matter. If you want to include 100 similar objects in that output, you can use a for loop. You can't do that in plain JSON/YAML.
dev_l1x_be 4/13/2025||
True. YAML is an intermediate representation between my intention expressed in Dhall and what runs in production.

https://github.com/dhall-lang/dhall-kubernetes

thund 4/13/2025||
I like using tags and avoid any doubt

!!boolean

https://dev.to/kalkwst/a-gentle-introduction-to-the-yaml-for...

ajuc 4/13/2025||
YAML is just doing too much and trying to be too clever.
cirwin 4/13/2025|
I’ve been working on https://conl.dev, which fixes/removes YAMLs problematic features.

Trying to find a tag-line for it I like, maybe “markdown for config”?

More comments...