Posted by b-man 9/5/2025
Tag-length-value (TLV) encodings are just overly verbose for no good reason. They are _NOT_ "self-describing", and one does not need everything tagged to support extensibility. Even where one does need tags, tag assignments can be fully automatic and need not be exposed to the module designer. Anyone with a modicum of time spent researching how ASN.1 handles extensibility with non-TLV encoding rules knows these things. The entire arc of ASN.1's evolution over two plus decades was all about extensibility and non-TLV encoding rules!
And yes, ASN.1 started with the same premise as PB, but 40 years ago. Thus it's terribly egregious that PB's designers did not learn any lessons at all from ASN.1!
Near as I can tell PB's designers thought they knew about encodings, but didn't, and near as I can tell they refused to look at ASN.1 and such because of the lack of tooling for ASN.1, but of course there was even less tooling for PB since it hadn't existed.
It's all exasperating.
It's a lesson most people learns the hard way after using PBs for a few months.
message AppLogMessage {
sint32 Value1 = 1;
double Value2 = 2;
}
becomes type Example struct {
state protoimpl.MessageState
xxx_hidden_Value1 int32
xxx_hidden_Value2 float64
xxx_hidden_unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
For [place of work] where we use protobuf I ended up making a plugin to generate structs that don't do any of the nonsense (essentially automating Option 1 in the article): type ExamplePOD struct {
Value1 int32
Value2 float64
}
with converters between the two versions.Adds a lot of space overhead, specially for structs only used one yet not self descriptive either.
Doesn’t solve a lot of problems related to changes either.
Quite frankly, too many are using up in it because it came from Google and is supposed to be some sort of divinely inspired thing.
JSON, ASN.1, and even rigid C structs start to look a lot better.
He also removed the capability to define a structure, and force you to use dictionary(structure) of array, instead of array of structure.
Not always - in browser applications for example, there is no way to directly access the disk, nevermind mmap().