Protobuffers Are Wrong (2018)

Posted by b-man 9/5/2025

Protobuffers Are Wrong (2018)(reasonablypolymorphic.com)

244 points | 307 commentspage 3

stinkbeetle 9/6/2025|

With these serialization libraries, do any of them have a facility that allows you to specify a wire format and an application format, with recipes for converting one to the other?

I haven't used these very seriously but a problem I had a while back was that that the wire format was not what the applications wanted to use, but a good application format was to space-inefficient for wire.

As far as I could see there was not a great way to do this. You could rewrite wire<->app converter in every app, or have a converter program and now you essentially have two wire formats and need to put this extra program and data movement into workflows, or write a library and maintain bindings for all your languages.

wffurr 9/6/2025||

>> You could rewrite wire<->app converter in every app

This is what Google does. We joke that our entire jobs are "convert protobuf A into protobuf B".

frumiousirc 9/6/2025|||

The way to do this starts with not hard-wiring the code generation step.

Instead, make codegen a function of BOTH a data schema object and a code template (eg expressed in Jinja2 template language - or ZeroMQ GSL where I first saw this approach). The codegen stage is then simply the application of the template to the data schema to produce a code artifact.

The templates are written assuming the data schema is provided following a meta-schema (eg JSON Schema for a data schema in JSON). One can develop, eg per-language templates to produce serialization code or intra-language converters between serialization forms (on wire) and application friendly forms. The extra effort to develop a template for a particular target is amortized as it will work across all data schemas that adhere to a common meta-schema.

The "codegen" stage can of course be given non "code" templates to produce, eg, reference documentation about the data schema in different formats like HTML, text, nroff/man, etc.

mdaniel 9/6/2025||

I didn't recognize the GSL citation, so for others:

- https://github.com/zeromq/gsl/blob/v4.1.5/examples/fsm_c.gsl

whew, this readme has everything

- XML in, text out: https://github.com/zeromq/gsl#:~:text=feed%20it%20some%20dat...

- a whole section on software engineering https://github.com/zeromq/gsl#model-oriented-programming

- they support COBOL https://github.com/zeromq/gsl#cobol

- and then a project 11 years old with "we're going to document these functions one day" https://github.com/zeromq/gsl#global-functions

What a journey that was

jzwinck 9/6/2025||

If you care about network bandwidth you can compress before sending, as virtually all web applications do. Then you don't need to worry much about the space efficiency of the application format.

stinkbeetle 9/6/2025||

Of the wire format you mean? I compress it and still need to care about the space efficiency of the wire format beyond that. Compression ratio does improve a lot when not doing our own, end result is significantly larger. Also it becomes also significantly slower because more data to process which is possibly the bigger problem.

It's probably not like most web application, it's hardware data loggers that produce about hundreds of millions to billions of events per second (each with minimum about 4 bytes of wire format and maximum roughly 500 bytes).

briandw 9/5/2025||

The crappy system that everyone ends up using is better than the perfectly designed system that's only seen in academic papers. Javascript is the poster-child of Worse is Better. Protobuffs are a PITA, but they are widely used and getting new adoption in industry. https://en.wikipedia.org/wiki/Worse_is_better

BoorishBears 9/5/2025|

I worked at a company that had their own homegrown Protobuf alternative which would add friction to life constantly. Especially if you had the audacity to build anything that wasn't meant to live in the company monorepo (your Python script is now a Docker image that takes 30 minutes to build).

One day I got annoyed enough to dig for the original proposal and like 99.9% of initiatives like this, it was predicated on:

- building a list of existing solutions

- building an overly exhaustive list, of every facet of the problem to be solved

- declare that no existing solution hits every point on your inflated list

- "we must build it ourselves."

It's such a tired playbook, but it works so often unfortunately.

The person who architects and sells it gets points for "impact", then eventually moves onto the next company.

In the meantime the problem being solved evolves and grows (as products and businesses tend to), the homegrown solution no longer solves anything perfectly, and everyone is still stuck dragging along said solution, seemingly forever.

Usually eventually someone will get tired enough of the homegrown solution and rightfully question why they're dragging it along, and if you're lucky it gets replaced with something sane.

If you're unlucky that person also uses it as justification to build a new in-house solution (we're built the old one after all), and you replay the loop.

In the case of serialization though, that's not always doable. This company was storing petabytes (if not exabytes) of data in the format for example.

bbkane 9/5/2025||

The author makes good arguments; I wish they'd offered some alternatives.

Despite issues, protobufs solve real problems and (imo) bring more value than cost to a project. In particular, I'd much rather work with protobufs and their generated ser/de than untyped json

nice_byte 9/5/2025||

> Make all fields in a message required.

funnily enough, this line alone reveals the author to be an amateur in the problem space they are writing so confidently about.

nice_byte 9/5/2025||

the complaints about the Protobuf type system being not flexible enough are also really funny to read.

fundamentally, the author refuses to contend with the fact that the context in which Protobufs are used -- millions of messages strewn around random databases and files, read and written by software using different versions of libraries -- is NOT the same scenario where you get to design your types once and then EVERYTHING that ever touches those types is forced through a type checker.

again, this betrays a certain degree of amateurishness on the author's part.

Kenton has already provided a good explanation here: https://news.ycombinator.com/item?id=45140590

instig007 9/6/2025||

> is NOT the same scenario where you get to design your types once and then EVERYTHING that ever touches those types is forced through a type checker.

the author never claimed the types had to be designed only once, he claimed that schema evolution chosen by protobuf is inadequate for the purpose of lossless evolution.

> Kenton has already provided a good explanation here: https://news.ycombinator.com/item?id=45140590

TLDR: yada-yada [...] protobuf is practical, type algebra either doesn't exist or impractical because only PL theorists know about it, not Kenton.

kentonv 9/6/2025||

> type algebra either doesn't exist or impractical because only PL theorists know about it, not Kenton.

Hi I'm Kenton. I, too, was enamored with advanced PL theory in college. Designed and implemented my own purely-functional programming language. Still wish someone would figure out a working version of dependent types for real-world use, mainly so we could prove array bounds-safety without runtime checks.

In two decades building real-world complex systems, though, I've found that getting PL theory right is rarely the highest-leverage way to address the real problems of software engineering.

pseudocomposer 18 hours ago|||

I dunno. Have you used Elm? I hadn’t until recently, but after getting past the learning curve, I honestly can’t recommend any more safe/painless framework for a web FE. It hasn’t been updated in a while, but to me this is a feature, again: its surface area is thin enough that there haven’t been any security issues in the same code for half a decade, and code from that long ago still works today.

I maintain a React app on the side, and a few other projects, and would still recommend it just due to developer availability, but there’s a saying among some of the Elm folks I know: “Good React code in 2025 looks like good Elm code from 2015.”

(To be fair: teams, and devs new to FP [myself included] will create complexity monstrosities in any paradigm, but Elm’s strong FP setup means huge subsets of those monstrosities won’t ever compile, and usually offer a clearer path for later cleanup.)

instig007 7 days ago|||

> Still wish someone would figure out a working version of dependent types for real-world use, mainly so we could prove array bounds-safety without runtime checks.

Hi Kenton, I'm not sure what kind of PL theory you studied in college, but "array bounds-safety without runtime checks" don't require dependent types. They are being proven with several available SMT solvers as of right now, just ask LLVM folks with their "LLVM_ENABLE_Z3_SOLVER" compiler flag, the one that people build their real-world solutions on.

By the way, you don't have to say "real-world" in every comment to appeal to your google years as a token of "real-world vs the rest of you". "But my team at google wouldn't use it", or something along that line, right?

https://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML...

kentonv 7 days ago||

Throwing a theorem-prover at the problem, unaided by developer hints, is not realistic in a large codebase. You need annotations that let you say "this array's size is is the same as that array" or "this integer is within the bounds of that array" -- that's dependent types.

instig007 7 days ago||

> Throwing a theorem-prover at the problem, unaided by developer hints, is not realistic in a large codebase.

Please, Kenton, don't move your goalpost. Who said about "unaided"? Annotations, whether they come directly from a developer, or from IR meta, don't make a provided SAT-constraint suddenly a "dependent type" component of your type system, it needs a bit more than that. Let's not miss the "types" in "dependent types". You don't modify type systems of your languages to run SAT solvers in large codebases.

Truly, if you believe that annotations for the purpose of static bounds checking "is not realistic in a large codebase" (or is it because you assume it's unaided?), I've got "google/pytype" and the entire Python community to justify before you.

kentonv 7 days ago||

Ah you are just trying to gaslight me. pytype doesn't do static bounds checking.

What compels you to do this? Posting just to make people angry? Do you not have anything better to do with all that PL theory expertise?

instig007 7 days ago||

> Ah you are just trying to gaslight me. pytype doesn't do static bounds checking.

It does static type checking from _annotations_ that live _outside_ the type system of the language. Have you forgotten that you began to argue that SMT solvers need constraint annotations to be realistic for static bounds checking in large codebases, and that the constraint annotations somehow become dependent types from that fact alone?

> What compels you to do this? Posting just to make people angry? Do you not have anything better to do with all that PL theory expertise?

You're all over the place, it's frustrating that instead of fairly addressing the points about inferior aspects of the protobuf protocol design that are unnecessary for the purpose of backward-compatible distributed systems, you keep saying (or at least assuming) that it's the only realistic solution, because "I worked at google" and "reports at google prove me right".

giveita 9/6/2025||

I sniffed this. I am not familiar with protobufs, but aware they are for efficiency on the wire. The fact he only really talks about type systems and not the before vs. after of the affect on the wire was disappointing but also made me suspect to if this was a good piece.

TeeMassive 4 days ago||

I get the author's points, and they all are valid, but I don't understand why people would use generated code and its types through their entire project.

This is just asking for trouble when the API will inevitably break as all APIs will do eventually. In our projects I mandated and pushed really hard that we create intermediary data classes that correspond one to one to the protobufs (at first).

I got a lot of angry faces and reactions in PR due to the seemingly useless boiler plate code required but it saved our butts so many times when the API changed just before a release that it became the de facto standard.

Also, protobufs and GRPCs are a de facto standards. Are there better alternatives? Yes. Should you use those? Most likely not because the point of serialization frameworks is to be used by many people in various tech stacks.

MountainTheme12 9/5/2025||

I agree with the author that protobuf is bad and I ran into many of the issues mentioned. It's pretty much mandatory to add version fields to do backwards compatibility properly.

Recently, however, I had the displeasure of working with FlatBuffers. It's worse.

giveita 9/6/2025|

Out of interest why not make the version part of say the URL?

MountainTheme12 9/6/2025||

That one was used to implement save data in a game.

sgammon 9/6/2025||

> Your guess is as good as mine for why an enum can’t be used as a map key.

I filed an issue requesting this and it was denied with an explanation:

https://github.com/protocolbuffers/protobuf/issues/7791#issu...

sgammon 9/6/2025||

> Contrast this behavior against message types. While scalar fields are dumb, the behavior for message fields is outright insane.

The reason messages are initialized is that you can easily set a deep property path:

```

message SomeY { string example = 1; }

message SomeX { SomeY y = 1; }

```

later, in java:

```

SomeX some = SomeX.newBuilder();

some.getY().setExample("hello"); // does not produce npe

```

in kotlin this syntax makes even more sense:

```

some {

  y.example = "hello". // does not produce npe

}

```

sgammon 9/6/2025||

> It’s impossible to differentiate a field that was missing in a protobuffer from one that was assigned to the default value.

This is purportedly fixed in proto3 and latest SDK copies (IIRC)

lukaslalinsky 7 days ago||

I recently made a realization, that I can use MessagePack with a static schema defined in the code, and even pre-defined numeric field IDs, essentially replacing Protobuf for my use cases. I saw MessagePack as an alternative for JSON, with loose message structure, but it's actually a nice binary format and can be used more effectively than that. So now I enjoy things like tagged unions (in Zig/Python), and other types that are awkward to express in Protobuf. I settled on single character field names, for compatibility with msgspec, and I'm pretty happy with it. Still super compact messages with predictable schema, that are fast to parse, because I know which fields to expect.

taeric 9/5/2025||

I'm more than a little curious what event caused such a strong objection to protobuffers. :D

I do tend to agree that they are bad. I also agree that people put a little too much credence in "came from Google." I can't bring myself to have this much anger towards it. Had to have been something that sparked this.

rimunroe 9/5/2025||

I'm just a frontend developer so most of my exposure is just as an API consumer and not someone working on the service side of things. That said:

A few years ago I moved to a large company where protobufs were the standard way APIs were defined. When I first started working with the generated TypeScript code, I was confused as to why almost all fields on generated object types were marked as optional. I assumed it was due to the way people were choosing to define the API at first, but then I learned this was an intentional design choice on the part of protobufs.

We ended up having to write our own code to parse the responses from the "helpfully" generated TypeScript client's responses. This meant we had to also handle rejecting nonsensical responses where an actually required field wasn't present, which is exactly the sort of thing I'd want generated clients to do. I would expect having to do some transformation myself, but not to that degree. The generated client was essentially useless to us, and the protocol's looseness offered no discernible benefit over any other API format I've used.

I imagine some of my other complaints could be solved with better codegen tools, but I think fundamentally the looseness of the type system is a fatal issue for me.

vl 9/5/2025|||

It used to be that there was no official TypeScript protobuf generator from Google and third-party generators sucked. Using protobufs from web browser or in nodejs was painful.

Couple years ago Connect released very good generator for TypeScript, we use in in production and it's great:

https://github.com/connectrpc/connect-es

thinkharderdev 9/5/2025|||

Yeah, as soon as you have a moderately complex type the generated code is basically useless. Honestly, ~80% of my gripes about protocol buffers could be alleviated by just allowing me to mark a message field as required.

cherrycherry98 9/5/2025|||

Proto2 let you do this and the "required" keyword was removed because of the problems it introduces when evolving the schema in a system with many users that you don't necessarily control. Let's say you want to add a new required field, if your system receives messages from clients some clients may be sending you old data without the field and now the parse step fails because it detects a missing field. If you ever want to remove a required field you have the opposite problem, there will components that have to have those fields present just to satisfy the parser even if they're only interested in some other fields.

Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization. You can't specify that an integer falls into a certain valid range or that a string has a valid number of characters or is the correct format (e.g. if it's supposed to be an email or a phone number). The application code needs to do that kind of validation anyway. If something really is required then that should be the application's responsibility to deal with it appropriately if it's missing.

The Captn Proto docs also describe why being able to declare required fields is a bad idea: https://capnproto.org/faq.html#how-do-i-make-a-field-require...

thinkharderdev 9/6/2025|||

> Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization

But protocol buffers is not just a serialization format it is an interface definition language. And not being able to communicate that a field is required or not is very limiting. Sometimes things are required to process a message. If you need to add a new field but be able to process older versions of the message where the field wasn't required (or didn't exist) then you can just add it as optional.

I understand that in some situations you have very hard compatibility requirements and it makes sense to make everything optional and deal with it in application code, but adding a required attribute to fields doesn't stop you from doing that. You can still just make everything optional. You can even add a CI lint that prevents people from merging code with required fields. But making required fields illegal at the interface definition level just strikes me as killing a fly with a bazooka.

rimunroe 9/5/2025||||

> Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization.

My issue is that people seem to like to use protobuf to describe the shape of APIs rather than just something to handle serialization. I think it's very bad at the describing API shapes.

taeric 9/5/2025||

I think it is somewhat of a natural failure of DRY taken to the extreme? People seem to want to get it so that they describe the API in a way that is then generated for clients and implementations.

It is amusing, in many ways. This is specifically part of what WSDL aspired to, but people were betrayed by the big companies not having a common ground for what shapes they would support in a description.

instig007 9/6/2025|||

> Let's say you want to add a new required field, if your system receives messages from clients some clients may be sending you old data without the field and now the parse step fails because it detects a missing field.

A parser has to (inherently) neither fail (compatibility mode) nor lose the new field (a passthrough mode), nor allow diverging (strict mode). The fact that capnproto/parser authors don't realize that the same single protocol can operate in three different scenarios (but strictly speaking: at boundaries vs in middleware) at the same time, should not result in your thinking that there are problems with required fields in protocols. This is one of the most bizzare kinds of FUD in the industry.

kentonv 9/6/2025||

Hi, I'm the apparently-FUD-spreading Cap'n Proto author.

Sure! You could certainly imagine extending Protobuf or Cap'n Proto with a way to specify validation that only happens when you explicitly request it. You'd then have separate functions to parse vs. to validate a message, and then you can perform strict validation at the endpoints but skip it in middleware.

This is a perfectly valid feature idea which many people have entertained an even implemented successfully. But I tend to think it's not worth trying to do have this in the schema language because in order to support every kind of validation you might want, you end up needing a complete programming language. Plus different components might have different requirements and therefore need different validation (e.g. middleware vs. endpoints). In the end I think it is better to write any validation functions in your actual programming language. But I can certainly see where people might disagree.

lostdog 9/6/2025||

It gets super frustrating to have to empty/null check fields everywhere you use them, especially for fields that are effectively required for the message to make sense.

A very common example I see is Vec3 (just x, y, z). In proto2 you should be checking for the presence of x,y,z every time you use them, and when you do that in math equations, the incessant existence checks completely obscure the math. Really, you want to validate the presence of these fields during the parse. But in practice, what I see is either just assuming the fields exist in code and crashing on null, or admitting that protos are too clunky to use, and immediately converting every proto into a mirror internal type. It really feels like there's a major design gap here.

Don't get me started on the moronic design of proto3, where every time you see Vec3(0,0,0) you get to wonder whether it's the right value or mistakenly unset.

kentonv 9/6/2025||

> It gets super frustrating to have to empty/null check fields everywhere you use them, especially for fields that are effectively required for the message to make sense.

That's why Protobuf and Cap'n Proto have default values. You should not bother checking for presence of fields that are always supposed to be there. If the sender forgot to set a field, then they get the default value. That's their problem.

> just assuming the fields exist in code and crashing on null

There shouldn't be any nulls you can crash on. If your protobuf implementation is returning null rather than a default value, it's a bad implementation, not just frustrating to use but arguably insecure. No implementation of mine ever worked that way, for sure.

lostdog 9/7/2025||

Sadly, the default values are an even bigger source of bugs. We just caught another one at $work where a field was never being filled in, but the default values made it look fine. It caused hidden failures later on.

It's an incredibly frustrating "feature" to deal with, and causes lots of problems in proto3.

kentonv 7 days ago||

You can still verify presence explicitly if you want, with the `has` methods.

But if you don't check, it should return a default value rather than null. You don't want your server to crash on bad input.

iamdelirium 9/5/2025||||

You think you do but you really don't.

What happens if you mark a field as required and then you need to delete it in the future? You can't because if someone stored that proto somewhere and is no longer seeing the field, you just broke their code.

thinkharderdev 9/5/2025|||

If you need to deserialize an old version then it's not a problem. The unknown field is just ignored during deserialization. The problem is adding a required field since some clients might be sending the old value during the rollout.

But in some situations you can be pretty confident that a field will be required always. And if you turn out to be wrong then it's not a huge deal. You add the new field as optional first (with all upgraded clients setting the value) and then once that is rolled out you make it required.

And if a field is in fact semantically required (like the API cannot process a request without the data in a field) then making it optional at the interface level doesn't really solve anything. The message will get deserialized but if the field is not set it's just an immediate error which doesn't seem much worse to me than a deserialization error.

iamdelirium 9/5/2025||

1. Then it's not really required if it can be ignored.

2. This is the problem, software (and protos) can live for a long time). They might be used by other clients elsewhere that you don't control. What you thought might not required 10 years down the line is not anymore. What you "think" is not a huge deal then becomes a huge deal and can cause downtime.

3. You're mixing business logic and over the wire field requirement. If a message is required for an interface to function, you should be checking it anyway and returning the correct error. How is that change with proto supporting require?

thinkharderdev 9/6/2025||

> Then it's not really required if it can be ignored.

It can be required in v2 but not in v1 which was my point. If the client is running v2 while the server is still on v1 temporarily, then there is no problem. The server just ignores the new field until it is upgraded.

> This is the problem, software (and protos) can live for a long time). They might be used by other clients elsewhere that you don't control. What you thought might not required 10 years down the line is not anymore. What you "think" is not a huge deal then becomes a huge deal and can cause downtime.

Part of this is just that trying to create a format that is suitable both as an rpc wire serialization format and ALSO a format suitable for long term storage leads to something that is not great for either use case. But even taking that into account, RDBMS have been dealing with this problem for decades and every RDBMS lets you define fields as non-nullable.

> If a message is required for an interface to function, you should be checking it anyway and returning the correct error. How is that change with proto supporting require?

That's my point, you have to do that check in code which clutters the implementation with validation noise. That and you often can't use the wire message in your internal domain model since you now have to do that defensive null-check everywhere the object is used.

Aside from that, protocol buffers are an interface definition language so should be able to encode some of the validation logic at least (make invalid states unrepresentable and all that). If you are just looking at the proto IDL you have no way of knowing whether a field is really required or not because there is no way to specify that.

ozgrakkurt 9/5/2025|||

Maybe you don’t delete it then?

taeric 9/5/2025||

I mean, this is essentially the same lesson that database admins learn with nullable fields. Often it isn't the "deleting one is hard" so much as "adding one can be costly."

It isn't that you can't do it. But the code side of the equation is the cheap side.

taeric 9/5/2025|||

To add to the sibling, I've seen this with Java enums a lot. People will add it so that the value is consumed using the enum as fast as they can. This works well as long as the value is not retrieved from data. As soon as you do that, you lose the ability to add to the possible values in a rolling release way. It can be very frustrating to know that we can't push a new producer of a value before we first change all consumers. Even if all consumers already use switch statements with default clauses to exhaustively cover behavior.

thinkharderdev 9/5/2025||

But this is something you should be able to handle on a case-by-case basis. If you have a type which is stored durably as protobuf then adding required fields is much harder. But if you are just dealing with transient rpc messages then it can be done relatively easily in a two step process. First you add the field as optional and then once all producers are upgraded (and setting the new field), make it required. It's annoying for sure but still seems better than having everything optional always and needing to deal with that in application code everywhere.

taeric 9/5/2025||

Largely true. If you are at Google scale, odds are you have mixed fleets deployed. Such that it is a bit of involved process. But it is well defined and doable. I think a lot of us would rather not do a dance we don't have to do?

thinkharderdev 9/6/2025||

Sure, you just have to balance that against the cost of a poorly specified API interface. The errors because clients aren't clear on what is really required or not, what they should consider an error if it is not defined, etc. And of course all the boilerplate code that you have to write to convert the interface model to an internal domain model you can actually use inside your code.

mrits 9/5/2025|||

I've used them almost daily for 15 years. They are way down the list of things I'd want improved. It has been interesting to see the protobuffers killers die out every few years though

thinkharderdev 9/5/2025|||

I feel like I could have written an article like this at various points. Probably while spending two hours trying to figure out a way to represent some protobuf type in a sane way internally.

jandrese 9/5/2025|||

As a developer I always see "came from Google" as a yellow flag.

Too often I find something mildly interesting, but then realize that in order for me to try to use it I need to set up a personal mirror of half of Google's tech stack to even get it to start.

mike_hearn 9/5/2025||

He says that in the article; he had to work on a "compiler" project that was much harder than it should have been because of protobuf's design choices.

taeric 9/5/2025||

Yeah, I saw that. I took that as something that happened in the past, though. Certainly colored a lot of the thinking, but feels like something more immediate had to have happened. :D

zigzag312 9/5/2025|

Among other things, I don't like that they won't support nullable getters/setters:

https://protobuf.dev/design-decisions/nullable-getters-sette...

More comments...