Posted by b-man 9/5/2025
> * Make all fields in a message required. This makes messages product types.
Meanwhile in the capnproto FAQ:
>How do I make a field “required”, like in Protocol Buffers?
>You don’t. You may find this surprising, but the “required” keyword in Protocol Buffers turned out to be a horrible mistake.
I recommend reading the rest of the FAQ [0], but if you are in a hurry: Fixed schema based protocols like protobuffers do not let you remove fields like self describing formats such as JSON. Removing fields or switching them from required to optional is an ABI breaking change. Nobody wants to update all servers and all clients simultaneously. At that point, you would be better off defining a new API endpoint and deprecating the old one.
The capnproto faq article also brings up the fact that validation should be handled on the application level rather than the ABI level.
This sums up a lot of the issues I’ve seen with protobuf as well. It’s not an expressive enough language to be the core data model, yet people use it that way.
In general, if you don’t have extreme network needs, then protobuf seems to cause more harm than good. I’ve watched Go teams spend months of time implementing proto based systems with little to no gain over just REST.
X.680 is fairly small. Require AUTOMATIC TAGs, remove manual tagging, remove REAL and EMBEDDED PDV and such things, and what you're left with is pretty small.
Protobufs have lots of problems, but at least they are better than ASN.1!
Things people say who know very little about ASN.1:
- it's bloated! (it's not)
- it's had lots of vulnerabilities! (mainly in hand-coded codecs)
- it's expensive (it's not -- it's free and has been for two decades)
- it's ugly (well, sure, but so is PB's IDL)
- the language is context-dependent, making it harder to write a parser for (this is quite true, but so what, it's not that big a deal)
The vulnerabilities were only ever in implementations, and almost entirely in cases of hand-coded codecs, and the thing that made many of these vulnerabilities possible was the use of tag-length-value encoding rules (BER/DER/CER) which, ironically, Protocol Buffers bloody is too.
If you have a different objections to ASN.1, please list them.
- There is no backward or forward compatibility by default.
(Sure, you can have every SEQUENCE have all fields OPTIONAL and ... at the end, but how many real-life schemas like that you have seen? Almost every ASN.1 you can find on the internet is static SEQUENCE, with no extensibility whatsoever)
- Tools are bad.
Yes, protoc can be a PITA to integrate into build system, but at least it (1) exists, (2) well-tested (3) supports many languages. Compared to ASN.1 where the good tooling is so rare, people routinely manually parse/generate the files!
- Honorable mention: using "tag" in TLV to describe only the type and not field name - that SEQUENCE(30) tag will be all over the place, and the contents will be wildly different. Compare to protobuf, where the "tag" is field index, and that's exactly what allows such a great forward/backward compatibility.
(Could ASN.1 fix those problems? Not sure. Yes, maybe one could write better tooling, but all the existing users know that extensibility is for the weak, and non-optional SEQUENCEs are the way to go. It is easier to write all-new format than try to change existing conventions.)
ASN.1 in 1984 had it. Later ASN.1 evolved to have a) explicit extensibility markers, and b) the `EXTENSIBILITY IMPLIED` module option that implies every SEQUENCE, SET, ENUM, and other things are extensible by default, as if they ended in `, ...`.
There are good reasons for this change:
- not all implementors had understood the intent, so not all had implemented "ignore unexpected new fields"
- sometimes you want non-extensible things
- you may actually want to record in the syntax all the sets of extensions
> - Tools are bad.
But there were zero -ZERO!- tools for PB when Google created PB. Don't you see that "the tools that existed were shit" is not a good argument for creating tools for a completely new thing instead?
> - Honorable mention: using "tag" in TLV to describe only the type and not field name - that SEQUENCE(30) tag will be all over the place, and the contents will be wildly different. Compare to protobuf, where the "tag" is field index, and that's exactly what allows such a great forward/backward compatibility.
In a TLV encoding you can very much use the "type" as the tag for every field sometimes, namely when there would be no ambiguity due to OPTIONAL fields being present or absent, and when you do have such ambiguities you can resort to manual tagging with field numbers or whatever you want. For example:
Thing ::= SEQUENCE {
a UTF8String,
b UTF8String
}
works even though both fields get the same tag (when using a TLV encoding) because both fields are required, while this is broken: Broken ::= SEQUENCE {
a UTF8String OPTIONAL,
b UTF8String
}
and you would have to fix it with something like: Fixed ::= SEQUENCE {
a [0] UTF8String OPTIONAL,
b UTF8String
}
What PB does is require the equivalent of manually applying what ASN.1 calls IMPLICIT tags to every field, which is silly and makes it harder to decode data w/o reference to the module that defines its schema (this last is sketchy anyways, and I don't think it is a huge advantage for the ASN.1 BER/DER way of doing things, though others will disagree).> (Could ASN.1 fix those problems? Not sure. Yes, maybe one could write better tooling, but all the existing users know that extensibility is for the weak, and non-optional SEQUENCEs are the way to go. It is easier to write all-new format than try to change existing conventions.)
ASN.1 does not have these problems.
Better tooling does exist and can exist -- it's no different than writing PB tooling, at least for a subset of ASN.1, because ASN.1 does have many advanced features that PB lacks, and obviously implementing all of ASN.1 is more work than implementing all of PB.
> It is easier to write all-new format than try to change existing conventions.
Maybe, but only if you have a good handle on what came before.
I strongly recommend that you actually read x.680.
Nearly every other complaint is solved by wrapping things in messages (sorry, product types). Don't get the enum limitation on map keys, that complaint is fair.
Protobuf eliminates truckloads of stupid serialization/deserialization code that, in my embedded world, almost always has to be hand-written otherwise. If there was a tool that automatically spat out matching C, Kotlin, and Swift parsers from CDDL, I'd certainly give it a shot.
Some solutions do exist like here’s a C one[1] which maybe you could throw in some WASI / WASM compilation and get “somewhat” idiomatic bindings in a bunch of languages.
Here’s another for Rust [2] but I’m sure I’ve seen a bunch of others around. I think what’s missing is a unified protoc style binary with language specific plugins.
It's only proto3 that doesn't distinguish between zero and unset by default. Both the earlier and later versions support it.
Proto3 was a giant pile of poop in most respects, including removing support for field presence. They eventually put it back in as a per-field opt-in property, but by then the damage was done.
A huge unforced mistake, but I don't think a change made after the library had existed for 15 years and reverted qualifies as an "original sin".
Most of the other issues in the article can be solved be wrapping things in more messages. Not great, not terrible.
As with the tightly-coupled issues with Go, I'll keep waiting for a better approach any decade now. In the meantime, both tools (for their glaring imperfections) work well enough, solve real business use cases, and have a massive ecosystem moat that makes them easy to work with.
They did it essentially as a linked list, C-strings, or UTF-8 characters: "current data, and is there more (next pointer, non-null byte, continuation bit set)?" They also noted that it could have this semantics without necessarily following this implementation encoding, though that seems like a dodge to me; length-prefixed array is a perfectly fine primitive to have, and shouldn't be inferred from something that can map to it.
This is not pb nor go. A sensible default of invalid state would have caught this. So would an error and crash. Either would have been better than corrupt data.
So HN, what are the best alternatives available today and why?
Then it hardly solves the same problem Protobuf solves.
It’s the new default in a lot of IOT specs, it’s the backbone for deep space communication networks etc..
Maintains interoperability with JSON. Is very much battle tested in very challenging environments.
I don't actually want to do this, because then you have N + 1 implementations of each data type, where N = number of programming languages touching the data, and + 1 for the proto implementation.
What I personally want to do is use a language-agnostic IDL to describe the types that my programs use. Within Google you can even do things like just store them in the database.
The practical alternative is to use JSON everywhere, possibly with some additional tooling to generate code from a JSON schema. JSON is IMO not as nice to work with. The fact that it's also slower probably doesn't matter to most codebases.
I think this is exactly what you end up with using protobuf. You have an IDL that describes the interface types but then protoc generates language-specific types that are horrible so you end up converting the generated types to some internal type that is easier to use.
Ideally if you have an IDL that is more expressive then the code generator can create more "natural" data structures in the target language. I haven't used it a ton, but when I have used thrift the generated code has been 100x better than what protoc generates. I've been able to actually model my domain in the thrift IDL and end up with types that look like what I would have written by hand so I don't need to create a parallel set of types as a separate domain model.
Protobuf has a bidirectional JSON mapping that works reasonably well for a lot of use cases.
I have used it to skip the protobuf wire format all together and just use protobuf for the IDL and multi-language binding, both of which IMO are far better than JSON-Schema.
JSON-Schema is definitely more powerful though, letting you do things like field level constraints. I'd love to see you tomorrow that paired the best of both.
Beyond that it is a very simple language. But yes, 100%, for better and worse, it is deeply inspired by Google's codebase and needs