Posted by hgs3 2 days ago
In my opinion, JSON works well for data interchange, but it's overused for configuration, it's not localization-friendly, and it's too syntactically noisy. INI is simple but lacks hierarchical structures and doesn't have a formal specification. Confetti is intended to bridge the gap.
I aim to keep Confetti simple and minimalistic, while encouraging others to extend it. Think of it like Markdown for configuration files: there's a core specification, but your welcome to create your own variations that suit your needs.
It clearly competes with JSON.
I think I would still much rather use JSON5 over this. It's quite similar in terms of structure and terseness, but I don't have to learn anything.
// This is a comment.
{
probe_device: ["eth0", "eth1"],
users: [
{
user: "*",
login: "anonymous",
password: "${ENV:ANONPASS}",
machine: "167.89.14.1",
proxy: {
try_ports: [582, 583, 584],
},
},
{
user: "Joe Williams",
login: "joe",
machine "167.89.14.1",
},
],
}
Still, it seems fairly well designed and elegant. Way better than YAML or TOML for example. Typeless seems like a bad decision in some ways but I can see the advantages.Top marks on the name!
Call it j5on
The first paragraph says:
[...] It is minimalistic, untyped, and opinionated. [...]
but then under "Notable features" it begins with a big bold *Unopinionated*, so that was very confusing.
Good to see a push towards less syntactic overhead, which is still considerable in JSON.
[1] https://en.wikipedia.org/wiki/Comparison_of_data-serializati... [2] https://rigaux.org/language-study/syntax-across-languages.ht...
> Confetti source text consists of zero or more Unicode scalar values. For compatibility with source code editing tools that add end-of-file markers, if the last character of the source text is a Control-Z character (U+001A), implementations may delete this character.
I’ve heard of this once, when researching ASCII control codes and related ancient history, but never once seen it in real life. If you’re insisting on valid Unicode, it sounds to me like you’re several decades past that happening.
And then given that you forbid control characters in the next section… make up your mind. You’re saying both that implementations MAY delete this character, and that source MUST NOT use it. This needs clarification. In the interests of robustness, you need to specify what parsers MUST/SHOULD/MAY do in case of content MUST violations, whether it be reject the entire document, ignore the line, replace with U+FFFD, &c. (I would also recommend recapitalising the RFC 2119 terms. Decapitalising them doesn’t help readability because they’re often slightly awkward linguistically without the reminder of the specific technical meaning; rather it reduces their meaning and impact.)
> For compatibility with Windows operating systems, implementations may treat the sequence Carriage Return (U+000D) followed by Line Feed (U+000A) as a single, indivisible new line character sequence.
This is inviting unnecessary incompatibility. I recommend that you either mandate CRLF merging, or mandate CR stripping, or disallow special CRLF handling. Otherwise you can cause different implementations to parse differently, which has a long history of causing security problems, things like HTTP request smuggling.
I acknowledge this is intended as the base for a family of formats, rather than a strict single spec, but I still think allowing such variation for no good reason is a bad idea. (I’m not all that eager about the annexes, either.)
Most existing formats are really bad for at least one of these. Tables in JSON have tons of repetition. XML doesn't have a clear and obvious way to do maps. Almost anything other than XML is awkward at best for node trees.
Confetti seems to cover maps, trees, and non-nested lists really well, which isn't a combination any other format I'm aware of covers as well.
Nested lists and tables seem like they would be more awkward, though from what I can tell "-" is a legal argument, so you could do:
nestedlist {
- { - 1 ; - 2 }
- {
- { - a ; - b }
- { - c ; - d }
}
}
To get something like [[1, 2], [[a, b], [c, d]]]. Of course you could also name the items (item { item 1 ; item 2 }), but either way this is certainly more awkward than a nested list in JSON or YAML.I think a table could be done like JSON/HTML with repeated keys, but maybe also like:
table name age favorite-color {
row Bob 87 red
row "Someone else" 106 "bright neon green"
}
This is actually pretty nice.In any event, I love seeing more exploration of configuration languages, so thanks for sharing this!
My number 1 request is a parser on the documentation page that shows parse tree and converts to JSON or other formats so you can play with it.
One thing I didn't understand is this example on the homepage:
> password "${ENV:ANONPASS}"
The spec doesn't seem to mention any ${}. Is this for the program to manage rather than the parser of the config going out to fetch an env var? If so, I find this a bit out of scope to show; at least, it confused me about whether that's built-in/supported syntax or if it's just a literal with syntax intended for a different program
Depending on how set in stone this is, another complaint I might have is that you still have the trailing comma issue from JSON, except it's not a comma but a backslash (reverse solidus, as the spec calls it—my mobile keyboard didn't even know that word). Maybe starting a list of arguments with [ could allow one to use any number of lines for the values, until a ] is encountered?
"Reverse Solidus" is the Unicode name for the character [1], so if you don't like the name, blame Unicode :)
I hadn't thought of using '[' and ']' for multi-line directives, that's an interesting suggestion. It vaguely resembles arrays as they appear in various other languages. It fits with Confetti's design of, ultimately, being user interpreted.