Posted by carlos-menezes 5 days ago
Robustness has a meaning and it refers to handling bad inputs gracefully. An example of a lack of robustness is allowing a malicious actor to execute arbitrary code by supplying a datum larger than some buffer limit.
Trying to make sense of invalid inputs and do something with them isn't robustness. It's just example of making an extension to a spec. The extension could be robust or not.
Postel's Law amounts to "have extensions and hacks to handle incorrectly formatted data, rather than rejecting them. So, OK, yes, that entails being robust to certain bad inputs that are outside of the spec, but which land into onto one of the extensions. It doesn't entail being robust to inputs that fall outside of the core spec and all hacks/extensions.
Cherry picking certain bad inputs and giving them a meaning isn't, by itself, bona fide robustness; robustness means handling all bad inputs without crashing or allowing security to be compromised.
In a distributed non-adversarial setting, this is exactly what you want for robustness.
The problem, as we've come to realise in the time since Postel's law was formulated, is that there is no such thing as a distributed non-adversarial setting. So I get what you're saying.
But your definition of robustness is too narrow as well. There's more to robustness than security. When Outlook strips out a certificate from an email for alleged security reasons, then that's not robustness, that's the opposite, brokenness: You had one job, to deliver an attachment from A to B, and you failed.
Robustness and security can be at odds. It's quite OK to say, "on so and so occasion I choose to make the system not robust, because the robust solution would not be sufficiently secure".
Ouch, no. Dragons be there. Famous last words.
The only area in which is it acceptable to reason this way is graphical user interfaces. (And only if you've provided an API already for reliable automation, so that nobody has to automate the application through its GUI.). Is say graphical, because, no, not in command interfaces.
Even in the area of GUIs, new heuristics about intent cause annoyances to the users. But only annoyances and nothing more.
Like for example when you update your operating system, and now the window manager thinks that whenever you move a window so that its title bar happens to touch the top of the screen, you must be indicating the intent to maximize it.
I suppose the ship has sailed now that people are deploying LLMs in this way and that and those things intuit intent. They are like Postel's Law on amphetamines. There is a big cost to it, like warming the planet, and the systems become fragile for their lack of specification.
> When Outlook strips out a certificate from an email for alleged security reasons
I would say it's being liberal in what it accepts, if it's an alternative to rejecting the e-mail for security reasons.
It has taken a datum with a security problem and "fixed" it, so that it now looks like a datum without that security problem.
(I can't find references online to this exact issue that you're referring to, so I don't have the facts. Are you talking about incoming or outgoing? Is it a situation like an expired or otherwise invalid certificate not being used when sending an outgoing mail? That would be "conservative in what you send/do".)
Or to put it otherwise, Postel was right to begin with, albeit perhaps just a little too cryptic, and has been frequently misquoted and misinterpreted ever since.
This is especially ironic given that the constructive argument against Postel’s law is generally based on the value of a strict interpretation of specification. If you’re intentionally omitting half of the law, then you have an implementation problem.
Furthermore, none of this has much to do with YAML being a shitty design.
The world doesn't need principles that are half good.
That could undoubtedly lead to misapprehension. As the GP indicated, and as the word itself means, it references systemic stability and self-preservation behaviour. Reciprocally, however, the obligation to be liberal absolutely does not mean absolving faulty inputs of their flaws. For example, it would not excuse a dud response to an SSH handshake like trying to negotiate RC4. Both Steve Crocker and Eric Allman have been at pains to unpack the understanding of robustness, forgiveness, and format canonicalisation in security context, and they're hardly wrong. It's also why I'm particularly an advocate of the "do", not the "send", formulation. This is a much more systemic and contextual verb in its consequences for implementatation.
> staying away from generating inputs for other programs that exercise dark corner cases
This is exactly the kind of focus-solely-on-the-wire misdirection that I identified above as a common misinterpretation. Conforming to the most precise and unambiguous interpretation of a protocol, if there is one, in regards to what an implementation puts on the wire, can most certainly be a part of that, but that isn't always what being conservative looks like, and processing is equally if not more important.
The introduction of Explicit Congestion Notification (ECN) aka RFC 3168 (2001) springs to mind. RFC 791 (1981) defined bits 14 & 15 of the IPv4 header as "reserved for future use" and diagrammatically gave them as zero. RFC 1349 (Type of Service, 1992, now obsoleted) named them "MBZ" (Must Be Zero) bits but gave them to be otherwise ignored. RFC 2474 (DSCP, 1998) did much the same with what it termed the "Currently Unused field". When ECN was introduced, making use of those bits as a supposedly backwards-compatible congestion signalling mechanism, we discovered a significant proportion of IP implementations aboard endpoints, routers, and middleboxes were rejecting (by discard or reset) datagrams with nonzero values in those bits. Consequently, ECN has taken two decades to fully enable, and this is where both sides of the principle prove their joint and inseparable necessity; to this day many ECN-aware TCP/IP stacks are passive, stochastic, or incremental with their advertisement of ECN, and equally forgiving if the bits coming back don't conform, because an implementation that resets a connection under the circumstances where the developer comprehends the impedance mismatch would be absurd. Thus fulfilling both sides of the maxim in order to promote systemic stability and practical availability and giving ECN a path to the widespread interoperability it has today.
The exposition on page 13 of RFC 1122 (Requirements for Internet Hosts, 1989) broadly anticipated this entire scenario, even though the same section misquotes Postel (or, rather, uses the "send" restatement that I find too reductive).
The statement of the robustness principle is an integrated whole. A partial reading is, perhaps ironically, nonconformant; as with Popper's paradox of tolerance, one thing it cannot be liberal about is itself.
"Must be zero" means that when the datum is being constructed, they must be initialized to zero, not that when the datum is being consumed, they must also be validated to be zero.
Violating either rule will cause that implementation not to interoperate properly when it is still found deployed in a future in which the bits have now been put to use.
Rejecting must-be-initialized-to-zero fields for not being zero is not an example of a flaw caused by neglecting to be "liberal in what you accept". It's an example of failing to accept what is required: the requirement is to accept the datum regardless of what is in those reserved bits. It is arguably an instance of failing to "be conservative in what you do". If you are conservative, you stay away from reserved bits. You don't look at them, until such a time as when they have a meaning, accompanied by required (or perhaps optional) processing rules.
Now, I see the point that reading Postel's law might steer some designer away from putting in that harmful check for zero, specifically due to the "be liberal" part of it. But that's just a case of two wrongs accidentally making right. That same designer might refer to "be liberal" again in some other work, and do something stupid in that context.
The only thing that will really help is clear specs which spells out requirements like "the implementation shall not examine these bits for any purpose, including validating their value".
Surely someone at some point thought it was obvious that “No” should mean “false”, and that’s why we’re now in this mess.
If you accept crap, then eventually you will receive only crap.
... assume that the network is
filled with malevolent entities that will send in packets
designed to have the worst possible effect ...
[1] https://datatracker.ietf.org/doc/html/rfc761#section-2.10https://www.theverge.com/2020/8/6/21355674/human-genes-renam...
and, setting that aside, the very next paragraph says that this is a legit representation of -2.0 which means something has gone gravely wrong
value: -
# change this to 3.14 one day
2.0
User:
Name: >-
Bob
Phone: >-
01234 56789
Description:>-
This is a
multi line
description
That’s both readable and parses your records as strings.Edit: This stack overflow like provides more details https://stackoverflow.com/questions/3790454/how-do-i-break-a...
Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes
1: https://github.com/remarshal-project/remarshal#readme and/or $(brew install remarshal)
The trailing ‘:’ was there right after the ‘n’.
Examples of this syntax:
https://github.com/lmorg/murex/blob/master/builtins/core/arr...
I do agree it’s a bit of a kludge. But if you want data types and unquoted strings then anything you do to the syntax to denote strings over other data types then becomes a bit of a kludge.
The one good thing about this kludge is it allows for string literals (ie no complicated escaping rules).
> Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes
I don’t really understand what you’re alluding to here.
$ /usr/local/opt/ansible/libexec/bin/python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read()))' <<YML
User:
Name: >-
Bob
Phone: >-
01234 56789
Description:>-
This is a
multi line
description
YML
yaml.scanner.ScannerError: while scanning a simple key
in "<unicode string>", line 6, column 6:
Description:>-
$ gojq --yaml-input . <<YML
User:
Name: >-
Bob
Phone: >-
01234 56789
Description:>-
This is a
multi line
description
YML
gojq: invalid yaml: <stdin>:6
6 | Description:>-
^ could not find expected ':'
That's because, for better or worse, yaml considers that a legitimate key name, just missing its delimiter $ gojq --yaml-input . <<YML
User:
Name: >-
Bob
Phone: >-
01234 56789
Description:>-:
This is a
multi line
description
YML
{
"User": {
"Description:>-": "This is a multi line description",
"Name": "Bob",
"Phone": "01234 56789"
}
}
This exchange in a thread complaining about the whitespace sensitivity doesn't escape meAs for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive
Further to that point, none of the example links I’ve shared have the : at the end and I have production code that works using the formatting I’ve described. So you’re flat out wrong there with your assumption that block keys always terminate with :
> As for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive
You wouldn’t write code like that because >- denotes a block and you’re now inlining a string.
I mean I’ve shared links explaining how this works and you’re clearly not reading them.
At the end of the day, I’m not going to argue that >- (and its ilk) solves everything. It clearly doesn’t. If you want to write “minimized” YAML using JSON syntax then you’re far far better off quoting the string.
But if you are writing a string in YAML and either don’t want to deal with quotation marks, or need that string to be a string literal (ie not having to escape things like quotation marks) then my suggestion is an option.
It’s not there as a silver bullet but it is a lesser known feature of YAML. Hence me sharing.
Now go read the links and understand it better. You might genuinely find it useful under some scenarios ;)
And yet I brought receipts for my claims, and you just bring "reed the manul, n00b"
Secondly, your "receipts" were incorrect. Neither of your examples follows the examples I cited, and your second example creates a key named "Description:>-", which is clearly wrong. Hence why ">-" needs to be after the colon.
Here is more examples and evidence of how to use >- and why your "receipts" were also incorrect:
https://go.dev/play/p/1B4ba-dUARq
Here you can clearly see my example:
Foo: >-
hello
world
produces: { "Foo": "hello world" }
which is correct.Whereas your example:
Bar:>-:
hello
world
produces { "Bar:\u003e-": "hello world" }
which is incorrect.----
One final point: I don't understand why you're being so argumentative here. I posted a lesser-known YAML feature in case it helps some people and you've turned it into some kind of pissing match based on bad-faith interpretations of my comments. There was no need for you to do that.
It’s a bit of a sore spot in the YAML community as to why PyYAML can’t / won’t support YAML 1.2. It was in maintenance mode for a while. YAML 1.2 also introduced breaking changes.
From a SO comment: “ As long as you're okay with the YAML 1.1 standard, PyYAML is still perfectly fine, secure, etc. If you want to support the YAML 1.2 spec (released in 2009), you can use ruamel.yaml, which started out as a fork of PyYAML. – CrazyChucky Commented Mar 26, 2023 at 20:51”
So people work around the little paper cuts, while still hitting the traps from time to time as they forget them.
> generate YAML
I've a hard time finding a situation where I'd want to do that. Usually YAML is chosen for human readability, but here we're already in a higher level language first. JSON sounds a more appropriate target most of the time ?
> In my opinion, instead of pressuring and insulting people who actually clarify issues with YAML and the wrong statements of some of its proponents, I would kindly suggest reading the JSON spec (which is not that difficult or long) and finally make YAML compatible to it, and educating users about the changes, instead of spreading lies about the real compatibility for many years and trying to silence people who point out that it isn't true.
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.
Well that’s disappointing.
I guess software are human texts after all.
I’m just pointing out that it should be very simple to swap a YAML file for a JSON file in any system that accepts YAML
Configuration files for programs. These tend to be short.
DSLs which are large manifests for things like cloud infrastructure. These tend to be long, they grow over time.
My pet hypothesis is these DSLs exist mostly for neutrality - the vendor can't assume you have Python or something present. But as a user, you can assume just that and gain a lot by authoring in a proper language and generating YAML.
See https://github.com/cloudtools/troposphere for a great example for AWS CloudFormation.
This is where I use YAML and it shines there. IMO easier to read and write by hand than JSON, and short sweet config files don't have the various problems people run into with YAML. It's great.
On cloud infra, yes, having one or two layers of languages is a natural situation. GCP and AWS both accepting (encouraging?) JSON as a subset of YAML makes it a simpler choice when choosing an auto generating target.
You mention people wanting to author the generated files, I think in other situations tweaking the auto-generated files will be seen as riskier with potential overwriting issues, so lower readability will be seen as a positive.
!!boolean
https://dev.to/kalkwst/a-gentle-introduction-to-the-yaml-for...
Trying to find a tag-line for it I like, maybe “markdown for config”?
Escaped json probably hits that sweetspot by being a bit uglier than yaml, but 100 times simpler than xml, though.
<tasks>
<ansible.builtin.copy notify="restart minio">
<src> files/minio.service </src>
<dest> /etc/systemd/system/minio.service </dest>
<owner> root </owner>
<group> root </group>
<mode> 0x644 </mode>
</ansible.builtin.copy>
</tasks>
But you could use XSLT to generate documentation in XHTML from your playbooks about what files are deployed, what services are managed...Also, watch out: 0x644 != 0644 which is the mode you meant