Protobuffers Are Wrong (2018)

Posted by b-man 9/5/2025

Protobuffers Are Wrong (2018)(reasonablypolymorphic.com)

244 points | 307 comments

lalaithion 9/5/2025|

Protocol buffers suck but so does everything else. Name another serialization declaration format that both (a) defines which changes can be make backwards-compatibly, and (b) has a linter that enforces backwards compatible changes.

Just with those two criteria you’re down to, like, six formats at most, of which Protocol Buffers is the most widely used.

And I know the article says no one uses the backwards compatible stuff but that’s bizarre to me – setting up N clients and a server that use protocol buffers to communicate and then being able to add fields to the schema and then deploy the servers and clients in any order is way nicer than it is with some other formats that force you to babysit deployment order.

The reason why protos suck is because remote procedure calls suck, and protos expose that suckage instead of trying to hide it until you trip on it. I hope the people working on protos, and other alternatives, continue to improve them, but they’re not worse than not using them today.

jitl 9/5/2025||

Not widely used but I like Typical's approach

https://github.com/stepchowfun/typical

> Typical offers a new solution ("asymmetric" fields) to the classic problem of how to safely add or remove fields in record types without breaking compatibility. The concept of asymmetric fields also solves the dual problem of how to preserve compatibility when adding or removing cases in sum types.

rkagerer 9/6/2025|||

More direct link to the juicy bit: https://github.com/stepchowfun/typical?tab=readme-ov-file#as...

An asymmetric field in a struct is considered required for the writer, but optional for the reader.

sdenton4 9/6/2025||

That's a nice idea... But I believe the design direction of proto buffers was to make everything `optional`, because `required` tends to bite you later when you realize it should actually be optional.

bilkow 9/6/2025||

My understanding is that asymmetric fields provide a migration path in case that happens, as stated in the docs:

> Unlike optional fields, an asymmetric field can safely be promoted to required and vice versa.

> [...]

> Suppose we now want to remove a required field. It may be unsafe to delete the field directly, since then clients might stop setting it before servers can handle its absence. But we can demote it to asymmetric, which forces servers to consider it optional and handle its potential absence, even though clients are still required to set it. Once that change has been rolled out (at least to servers), we can confidently delete the field (or demote it to optional), as the servers no longer rely on it.

yencabulator 7 days ago|||

> My understanding is that asymmetric fields provide a migration path in case that happens, as stated in the docs:

If you can assume you can churn a generation of fresh data soonish, and never again read the old data. For RPC sure, but someone like Google has petabytes of stored protobufs, so they don't pretend they can upgrade all the writers.

sdenton4 6 days ago|||

....or we can just say that everything is optional always, and leave it to the servers instead of the protocol to handle irregularities.

summerlight 9/5/2025||||

This seems interesting. Still not sure if `required` is a good thing to have (for persistent data like log you cannot really guarantee some field's presence without schema versioning baked into the file itself) but for an intermediate wire use cases, this will help.

cornstalks 9/5/2025||||

I've never heard of Typical but the fact they didn't repeat protobuf's sin regarding varint encoding (or use leb128 encoding...) makes me very interested! Thank you for sharing, I'm going to have to give it a spin.

zigzag312 9/5/2025||

It looks similar to how vint64 lib encodes varints. Total length of varint can be determined via the first byte alone.

haberman 9/5/2025||

I advocated for PrefixVarint (which seems equivalent to vint64 ) for WebAssembly, but it was decided against, in favor of LEB128: https://github.com/WebAssembly/design/issues/601

The recent CREL format for ELF also uses the more established LEB128: https://news.ycombinator.com/item?id=41222021

At this point I don't feel like I have a clear opinion about whether PrefixVarint is worth it, compared with LEB128.

zigzag312 9/5/2025|||

Just remember that XML was more established than JSON for a long time.

kannanvijayan 7 days ago|||

Varint encoding is something I've peeked at in various contexts. My personal bias is towards the prefix-style, as it feels faster to decode and the segregation of the meta-data from the payload data is nice.

But, the thing that tends to tip the scales is the fact that in almost all real world cases, small numbers dominate - as the github thread you linked relates in a comment.

The LEB128 fast-path is a single conditional with no data-dependencies:

  if ! (x & 0x80) { x }

Modern CPUs will characterize that branch really well and you'll pay almost zero cost for the fastpath which also happens to be the dominant path.

It's hard to beat.

yencabulator 7 days ago||

SQLite format equivalent:

  if x <= 240 { x }

while strictly improving all other aspects (at least IMHO)

https://sqlite.org/src4/doc/trunk/www/varint.wiki

zigzag312 9/5/2025||||

This actually looks quite interesting.

sevensor 9/6/2025||||

Seems like a lot of effort to avoid adding a message version field. I’m not a web guy, so maybe I’m missing the point here, but I always embed a schema version field in my data.

vouwfietsman 9/6/2025|||

I get that.

The point is that its hard to prevent asymmetry in message versions if you are working with many communicating systems. Lets say four services inter-communicate with some protocol, it is extremely annoying to impose a deployment order where the producer of a message type is the last to upgrade the message schema, as this causes unnecessary dependencies between the release trains of these services. At the same time, one cannot simply say: "I don't know this message version, I will disregard it" because in live systems this will mean the systems go out of sync, data is lost, stuff breaks, etc.

There's probably more issues I haven't mentioned, but long story short: in live, interconnected systems, it becomes important to have intelligent message versioning, i.e: a version number is not enough.

kiitos 5 days ago|||

> Lets say four services inter-communicate with some protocol, it is extremely annoying to impose a deployment order where the producer of a message type is the last to upgrade the message schema

i don't know how you arrived at this conclusion

the protocol is the unifying substrate, it is the source of truth, the services are subservient to the protocol, it's not the other way around

also it's not just like each service has a single version, each instance of each service can have separate versions as well!

what you're describing as "annoying" is really just "reality", you can't hand-wave away the problems that reality presents

1718627440 5 days ago||||

> one cannot simply say: "I don't know this message version, I will disregard it" because in live systems this will mean the systems go out of sync, data is lost, stuff breaks, etc.

You already need to deal with lost messages, rejected messages, so just treat this case the same. If you have versions surely you have code to deal with mismatches and e.g. fail back to the older version.

sevensor 9/6/2025|||

I think I see what you’re getting at? My mental model is client and server, but you’re implying a more complex topology where no one service is uniquely a server or a client. You’d like to insert a new version at an arbitrary position in the graph without worrying about dependencies or the operational complexity of doing a phased deployment. The result is that you try to maintain a principled, constructive ambiguity around the message schema, hence asymmetrical fields? I guess I’m still unconvinced and I may have started the argument wrong, but I can see a reasonable person doing it that way.

vouwfietsman 9/6/2025||

Yes thats a big part, but even bigger is just the alignment of teams.

Imagine team A building feature XYZ Team B is building TUV

one of those features in each team deals with messages, the others are unrelated. At some point in time, both teams have to deploy.

If you have to sync them up just to get the protocol to work, thats an extra complexity in the already complex work of the teams.

If you can ignore this, great!

It becomes even more complex with rolling updates though: not all deployments of a service will have the new code immediately, because you want multiple to be online to scale on demand. This creates an immediate necessary ambiguity in the qeustion: "which version does this service accept?" because its not about the service anymore, but about the deployments.

sevensor 9/6/2025||

Ah, I see. Team A would like to deploy a new version of a service. It used to accept messages with schema S, but the new version accepts only S’ and not S. So the only thing you can do is define S’ so that it is ambiguous with S. Team B uses Team A’s service but doesn’t want to have to coordinate deployments with Team A.

I think the key source of my confusion was Team A not being able to continue supporting schema S once the new version is released. That certainly makes the problem harder.

vouwfietsman 9/7/2025||

Exactly!

vineyardmike 9/6/2025|||

Idk I generally think “magic numbers” are just extra effort. The main annoyance is adding if statements everywhere on version number instead of checking the data field you need being present.

It also really depends on the scope of the issue. Protos really excel at “rolling” updates and continuous changes instead of fixed APIs. For example, MicroserviceA calls MicroserviceB, but the teams do deployments different times of the week. Constant rolling of the version number for each change is annoying vs just checking for the new feature. Especially if you could have several active versions at a time.

It also frees you from actually propagating a single version number everywhere. If you own a bunch of API endpoints, you either need to put the version in the URL, which impacts every endpoint at once, or you need to put it in the request/response of every one.

sevensor 9/6/2025||

I think this is only a problem if you’re using a weak data interchange library that can’t use the schema number field to discriminate a union. Because you really shouldn’t have to write that if statement yourself.

atombender 9/6/2025|||

I'm really hoping Typical will catch on, as I quite like the design. One important gap right now is the lack of Go and Python support.

tyleo 9/5/2025|||

We use protocol buffers on a game and we use the back compat stuff all the time.

We include a version number with each release of the game. If we change a proto we add new fields and deprecate old ones and increment the version. We use the version number to run a series of steps on each proto to upgrade old fields to new ones.

swiftcoder 9/5/2025||

> We use the version number to run a series of steps on each proto to upgrade old fields to new ones

It sounds like you've built your own back-compat functionality on top of protobuf?

The only functionality protobuf is giving you here is optional-by-default (and mandatory version numbers, but most wire formats require that)

tyleo 9/5/2025||

Yeah, I’d probably say something more like, “we leverage protobuf built ins to make a slightly more advanced back compat system”

We do rename deprecated fields and often give new fields their names. We rely on the field number to make that work.

vkou 9/6/2025||

> We do rename deprecated fields and often give new fields their names. We rely on the field number to make that work.

Why share names? Wouldn't it be safer to, well, not?

tyleo 5 days ago||

The code becomes hard to read. You might need to change int health to float health. In that case “health” properly describes the idea. We’d change this to int DEPRECATED_health and float health.

Folks can argue that’s ugly but I’ve not seen one instance of someone confused.

jnwatson 9/5/2025|||

ASN.1 implements message versioning in an extremely precise way. Implementing a linter would be trivial.

cryptonector 9/6/2025||

This. Plus ASN.1 is pluggable as to encoding rules and has a large family of them:

  - BER/DER/CER (TLV)
  - OER and PER ("packed" -- no tags and
                 no lengths wherever
                 possible)
  - XER (XML!)
  - JER (JSON!)
  - GSER (textual representation)
  - you can add your own!
    (One could add one based on XDR,
     which would look a lot like OER/PER
     in a way.)

ASN.1 also gives you a way to do things like formalize typed holes.

Not looking at ASN.1, not even its history and evolution, when creating PB was a crime.

StopDisinfo910 7 days ago||

The people who wrote PB clearly knew ASN.1. It was the most famous IDL at the time. Do you assume they just came one morning and decided to write PB without taking a look at what existed?

Anyway, as stated PB does more than ASN.1. It specifies both the description format and the encoding. PB is ready to be used out of the box. You have a compact IDL and a performant encoding format without having to think about anything. You have to remember that PB was designed for internal Google use as a tool to solve their problems, not as a generic solution.

ASN.1 is extremely unwieldy in comparaison. It has accumulated a lot of cruft through the year. Plus they don’t provide a default implementation.

troupo 7 days ago|||

> The people who wrote PB clearly knew ASN.1.

And your assumption is based on what exactly?

> It was the most famous IDL at the time.

Strange that at the same time (2001) people were busy implementing everyting in Java and XML, not ASN.1

> Do you assume they just came one morning and decided to write PB without taking a look at what existed?

Yes, that is a great assumption. Looking at what most companies do, this is an assumption bordering on prescience.

StopDisinfo910 6 days ago||

> Strange that at the same time (2001) people were busy implementing everyting in Java and XML, not ASN.1

Yes. Meanwhile Google was designing an IDL with a default binary serialisation format. And this is not 2025 typical big corp, over staffed, fake HR levels heavy Google we are talking about. That’s Google in its heyday. I think you have answered your own comment.

cryptonector 7 days ago|||

> Do you assume they just came one morning and decided to write PB without taking a look at what existed?

Considering how bad an imitation of 1984 ASN.1 PB's IDL is, and how bad an imitation of 1984 DER PB is, yes I assume that PB's creators did not in fact know ASN.1 well. They almost certainly knew of ASN.1, and they almost certainly did not know enough about it because all the worst mistakes in ASN.1 PB re-created while adding zero new ideas or functionality. It's a terrible shame.

StopDisinfo910 6 days ago||

PB is not a bad imitation of 1984 ASN.1. ASN.1 is choke full of useless representations clearly there to serve what a committee thought the need of the telco industry should be.

I find it funny you are making it looks like a good and pleasant to use IDL. It’s a perfect example of design by committee at its worst.

PB is significantly more space efficient than DER by the way.

yearolinuxdsktp 9/5/2025|||

I agree that saying that no-one uses backwards compatible stuff is bizarre. Rolling deploys, being able to function with a mixed deployment is often worth the backwards compatibility overhead for many reasons.

In Java, you can accomplish some of this with using of Jackson JSON serialization of plain objects, where there several ways in which changes can be made backwards-compatibly (e.g. in the recent years, post-deserialization hooks can be used to handle more complex cases), which satisfies (a). For (b), there’s no automatic linter. However, in practice, I found that writing tests that deserialize prior release’s serialized objects get you pretty far along the line of regression protection for major changes. Also it was pretty easy to write an automatic round-trip serialization tester to catch mistakes in the ser/deser chain. Finally, you stay away from non-schemable ser/deser (such as a method that handles any property name), which can be enforced w/ a linter, you can output the JSON schema of your objects to committed source. Then any time the generated schema changes, you can look for corresponding test coverage in code reviews.

I know that’s not the same as an automatic linter, but it gets you pretty far in practice. It does not absolve you from cross-release/upgrade testing, because serialization backwards-compatibility does not catch all backwards-compatibility bugs.

Additionally, Jackson has many techniques, such as unwrapping objects, which let you execute more complicated refactoring backwards-compatibly, such as extracting a set of fields into a sub-object.

I like that the same schema can be used to interact with your SPA web clients for your domain objects, giving you nice inspectable JSON. Things serialized to unprivileged clients can be filtered with views, such that sensitive fields are never serialized, for example.

You can generate TypeScript objects from this schema or generate clients for other languages (e.g. with Swagger). Granted it won’t port your custom migration deserialization hooks automatically, so you will either have to stay within a subset of backwards-compatible changes, or add custom code for each client.

You can also serialize your RPC comms to a binary format, such as Smile, which uses back-references for property names, should you need to reduce on-the-wire size.

It’s also nice to be able to define Jackson mix-ins to serialize classes from other libraries’ code or code that you can’t modify.

mattnewton 9/5/2025|||

Exactly, I think of protobuffers like I think of Java or Go - at least they weren’t writing it in C++.

Dragging your org away from using poorly specified json is often worth these papercuts IMO.

const_cast 9/5/2025|||

Protobufs are better but not best. Still, by far, the easiest thing to use and the safest is actual APIs. Like, in your application. Interfaces and stuff.

Obviously if your thing HAS to communicate over the network that's one thing, but a lot of applications don't. The distributed system micro service stuff is a choice.

Guys, distributed systems are hard. The extremely low API visibility combined with fragile network calls and unsafe, poorly specified API versioning means your stuff is going to break, and a lot.

Want a version controlled API? Just write in interface in C# or PHP or whatever.

motorest 9/6/2025||

> Protobufs are better but not best.

This sort of comments doesn't add anything to the discussion unless you are able to point out what you believe to be the best. It reads as an unnecessary and unsubstantiated put-down.

const_cast 6 days ago||

I... did.

anonymousiam 9/5/2025|||

The original RPC code, from which Google derived their protobuf stuff was written in (pre-ANSI) C at Sun Microsystems.

tshaddox 9/5/2025|||

> Name another serialization declaration format that both (a) defines which changes can be make backwards-compatibly, and (b) has a linter that enforces backwards compatible changes.

The article covers this in the section "The Lie of Backwards- and Forwards-Compatibility." My experience working with protocol buffers matches what the author describes in this section.

the__alchemist 9/6/2025|||

This is always the thing to look for; "What are the alternatives?", and/why aren't there better ones.

I don't understand most use cases of protobufs, including ones that informed their design. I use it for ESP-hosted, to communicate between two MCUs. It is the highest-friction serialization protocol I've seen, and is not very byte-efficient.

Maybe something like the specialized serialization libraries (bincode, postcard etc) would be easier? But I suspect I'm missing something about the abstraction that applies to networked systems, beyond serialization.

tgma 9/5/2025|||

> And I know the article says no one uses the backwards compatible stuff but that’s bizarre to me – setting up N clients and a server that use protocol buffers to communicate and then being able to add fields to the schema and then deploy the servers and clients in any order is way nicer than it is with some other formats that force you to babysit deployment order.

Yet the author has the audacity to call the authors of protobuf (originally Jeff Dean et al) "amateurs."

jcgrillo 9/5/2025|||

As someone who has written many mapreduce jobs over years old protobufs I can confidently report the backwards compatibility made it possible at all.

noitpmeder 9/5/2025|||

Not that I love it -- but SBE (Simple Binary Encoding) is a _decent_ solution in the realm of backwards/forwards compatibility.

maximilianburke 9/5/2025|||

Flatbuffers satisfies those requirements and doesn’t have varint shenanigans.

leoc 9/5/2025|||

What about Cap’n Proto https://capnproto.org/ ? (Don't know much about these things myself, but it's a name that usually comes up in these discussions.)

usrnm 9/6/2025||

Cap'n'proto is not very nice to work with in C++, and I'd discourage anyone from using it from other programming languages, the implementations are just not there yet. We use both cnp and protobufs at work, and I vastly prefer protobufs, even for C++. I only wish they stayed the hell away from abseil, though.

yencabulator 7 days ago|||

The developer experience of capnproto is pretty darn miserable. I replaced my Rust use of it with https://rkyv.org/ -- probably the biggest ergonomic improvement was a single validation after which the message is safe to look at, instead of errors on every code path. The biggest downside was loss of built-in per-message schema evolution; in my use case I can have one version number up front.

porridgeraisin 9/6/2025|||

I always thought people had a positive view on abseil, never used it myself other than when tinkering on random projects. What's the main issue?

usrnm 9/6/2025||

The thing is a huge pain to manage as a dependency, especially if you wander away from the official google-approved way of doing things. Protobuf went from a breeze to use to the single most common source of build issues in our cross-platform project the moment they added this dependency. It's so bad that many distros and package managers keep the pre-abseil version as a separate package, and many just prefer to get stuck with it rather than upgrade. Same with other google libraries that added abseil as a dependency, as far as I'm aware

mkoubaa 9/6/2025|||

I'd rather they just used the abseil headers they needed with the abseil license at the top than make it a build dependency.

The concept of a package is antithetical to C++ and no amount of tooling can fix that.

usrnm 9/6/2025||

abseil is not header-only, though

mkoubaa 7 days ago||

Skill issue

jjmarr 9/6/2025|||

I like abseil besides the compile times. Not having to specialize my own hash when using maps is nice.

AYBABTME 9/6/2025|||

But you can't trust flatbuffers sent from unknown senders.

motorest 9/6/2025|||

> Just with those two criteria you’re down to, like, six formats at most, of which Protocol Buffers is the most widely used.

What I dislike the most about blog posts like this is that, although the blogger is very opinionated and critical of many things, the post dates back to 2018, protobuf is still dominant, and apparently during all these years the blogger failed to put together something that they felt was a better way to solve the problem. I mean, it's perfectly fine if they feel strongly about a topic. However, investing so much energy to criticize and even throw personal attacks on whoever contributed to the project feels pointless and an exercise in self promotion at the expense of shit-talking. Either you put something together that you feel implements your vision and rights some wrongs, or don't go out of your day to put down people. Not cool.

ardit33 9/6/2025|||

JSON exists, and when compressed it is pretty efficient. (not as efficient as protobuff though).

For client facing protocol Protobufs is a nightmare to use. For Machine to Machine services, it is ok-ish, yet personally I still don't like it.

When I was at Spotify we ditched it for client side apis (server to mobile/web), and never looked back. No one liked working with it.

motorest 9/6/2025||

> JSON exists (...)

The blog post leads with the personal assertion that "ad-hoc and built by amateurs". Therefore I doubt that JSON, a data serialization language designed by trimming most of JavaScript out and to be parses with eval(), would meet the opinionated high bar.

Also, JSON is a data interchange language, and has no support for types beyond the notoriously ill-defined primitives. In contrast, protobuf is a data serialization language which supports specifying types. This means that for JSON, to start to come close to meet the requirements met by protobuf, would need to be paired with schema validation frameworks and custom configurable parsers. Which it definitely does not cover.

ardit33 7 days ago||

You must be young. XML and XML Schemas existed before JSON or Protobuf, and people ditched them for a good reason and JSON took over.

Protobuf is just another version of the old RPC/Java Beans, etc... of a binary format. Yes, it is more efficient data wise than JSON, but it is a PITA to work on and debug with.

motorest 7 days ago||

> You must be young. XML and XML Schemas existed before JSON or Protobuf, and people ditched them for a good reason and JSON took over.

I'm not sure you got the point. It's irrelevant how old JSON or XML (a non sequitur) are. The point is that one of the main features and selling points of protobuf is strong typing and model validation implemented at the parsing level. JSON does not support any of these, and you need to onboard more than one ad-hoc tool to have a shot at feature parity, which goes against the blogger's opinionated position on the topic.

naikrovek 9/7/2025|||

TLV style binary formats are all you need. The “Type” in that acronym is a 32-bit number which you can use to version all of your stuff so that files are backwards compatible. Software that reads these should read all versions of a particular type and write only the latest version.

Code for TLV is easy to write and to read, which makes viewing programs easy. TLV data is fast for computers to write and to read.

Protobuf is overused because people are fucking scared to death to write binary data. They don’t trust themselves to do it, which is just nonsense to me. It’s easy. It’s reliable. It’s fast.

oftenwrong 7 days ago||

Protobuf is typically serialised using a TLV-style encoding.

https://protobuf.dev/programming-guides/encoding/

A major value of protobuf is in its ecosystem of tools (codegen, lint, etc); it's not only an encoding. And you don't generally have to build or maintain any of it yourself, since it already exists and has significant industry investment.

mgaunard 9/5/2025|||

in the systems I built I didn't bother with backwards compatibility.

If you make any change, it's a new message type.

For compatibility you can coerce the new message to the old message and dual-publish.

o11c 9/5/2025|||

I prefer a little builtin backwards (and forwards!) compatibility (by always enforcing a length for each object, to be zero-padded or truncated as needed), but yes "don't fear adding new types" is an important lesson.

jimbokun 9/5/2025|||

That only works if you control all the clients.

mgaunard 9/5/2025||

Dual-publishing makes it transparent to older clients.

Obviously you need to track when the old clients have been moved over so you can eventually retire the dual-publishing.

You could also do the conversion on the receiving side without a-priori information, but that would be extremely slow.

dlahoda 9/7/2025|||

https://github.com/dfinity/candid/blob/master/spec/Candid.md

orochimaaru 9/6/2025|||

Protobufs aren’t new. They’re really just rpc over https. I’ve used dce-rpc in 1997 which had IDL. I believe CORBA used IDL as well although I personally did not use it. There have been other attempts like ejb, etc. which are pretty much the same paradigm.

The biggest plus with protobuf is the social/financial side and not the technology side. It’s open source and free from proprietary hacks like previous solutions.

Apart from that, distributed systems of which rpc is a sub topic are hard in general. So the expectation would be that it sucks.

stickfigure 9/5/2025|||

Backwards compatibility is just not an issue in self-describing structures like JSON, Java serialization, and (dating myself) Hessian. You can add fields and you can remove fields. That's enough to allow seamless migrations.

It's only positional protocols that have this problem.

dangets 9/5/2025|||

You can remove JSON fields at the cost of breaking your clients at runtime that expect those fields. Of course the same can happen with any deserialization libraries, but protobufs at least make it more explicit - and you may also be more easily able to track down consumers using older versions.

nomel 9/5/2025||

For the missing case, whenever I use json, I always start with a sane default struct, then overwrite those with the externally provided values. If a field is missing, it will be handled reasonably.

jimbokun 9/5/2025|||

At the cost of much larger payloads.

stickfigure 9/5/2025||

With gzip encoding... not really.

mkoubaa 9/6/2025|||

Real ones know that serialization is what sucks.

tomrod 9/5/2025||

> Name another serialization declaration format that both (a) defines which changes can be make backwards-compatibly, and (b) has a linter that enforces backwards compatible changes.

ASCII text (tongue in cheek here)

summerlight 9/5/2025||

https://news.ycombinator.com/item?id=18190005

Just FYI: an obligatory comment from the protobuf v2 designer.

Yeah, protobuf has lots of design mistakes but this article is written by someone who does not understand the problem space. Most of the complexity of serialization comes from implementation compatibility between different timepoints. This significantly limits design space.

thethimble 9/5/2025||

Relatedly, most of the author's concerns are solved by wrapping things in a message.

> oneof fields can’t be repeated.

Wrap oneof field in message which can be repeated

> map fields cannot be repeated.

Wrap in message which can contain repeated fields

> map values cannot be other maps.

Wrap map in message which can be a value

Perhaps this is slightly inconvenient/un-ergonomic, but the author is positioning these things as "protos fundamentally can't do this".

evanmoran 9/6/2025|||

To clarify. Protobuf’s simplest change is adding a field to a message so wrapping maps of maps, maps of fields, oneof fields into a message makes these play to its strengths. It feels like over engineering to turn your Inventory map of items into a Inventory message, but you will be grateful for it when you need a capacity field later.

missinglugnut 9/5/2025||

>Most of the complexity of serialization comes from implementation compatibility between different timepoints.

The author talks about compatibility a fair bit, specifically the importance of distinguishing a field that wasn't set from one that was intentionally set to a default, and how protobuffs punted on this.

What do you think they don't understand?

summerlight 9/5/2025||

If you see some statements like below on the serialization topic:

> Make all fields in a message required. This makes messages product types.

> One possible argument here is that protobuffers will hold onto any information present in a message that they don't understand. In principle this means that it's nondestructive to route a message through an intermediary that doesn't understand this version of its schema. Surely that's a win, isn't it?

> Granted, on paper it's a cool feature. But I've never once seen an application that will actually preserve that property.

Then it is fair to raise eyebrows on the author's expertise. And please don't ask if I'm attached to protobuf; I can roast the protocol buffer on its wrong designs for hours. It is just that the author makes series of wrong claims presumably due to their bias toward principled type systems and inexperience of working on large scale systems.

instig007 9/6/2025||

> If you see some statements like below on the serialization topic:

> Make all fields in a message required. This makes messages product types.

> Then it is fair to raise eyebrows on the author's expertise.

It's fair to raise eyebrows on your expertise, since required fields don't contribute to b/w incompatibility at all, as every real-world protocol has a mandatory required version number that's tied to a direct parsing strategy with strictly defined algebra, both for shrinking (removing data fragments) and growing (introducing data fragments) payloads. Zero-values and optionality in protobuf is one version of that algebra, it's the most inferior one, subject to lossy protocol upgrades, and is the easiest one for amateurs to design. Then, there's next lavel when the protocol upgrade is defined in terms of bijective functions and other elements of symmetric groups that can tell you whether the newly announced data change can be carried forward (new required field) or dropped (removed field) as long as both the sending and receiving ends are able to derive new compound structures from previously defined pervasive types (the things the protobuf says are oneoffs and messages, for example).

xyzzyz 9/6/2025|||

What you describe using many completely unnecessary mathematical terms is not only not found in “every real-world protocol”, but in fact is something virtually absent from overwhelming majority of actually used protocols, with a notable exception of the kind of protocol that gets a four digit numbered RFC document that describes it. Believe it or not, but in the software industry, nobody is defining a new “version number” with “strictly defined algebra” when they want to add a new field to an communication protocol between two internal backend services.

instig007 9/6/2025|||

> What you describe using many completely unnecessary mathematical terms

Unnecessary for you, surely.

> Believe it or not, but in the software industry, nobody is defining a new “version number” with “strictly defined algebra” when they want to add a new field to an communication protocol between two internal backend services.

Name a protocol that doesn't have a respective version number, or without the defined algebra in terms of the associated spec clarifications that accompany the new version. The word "strictly" in "strictly defined algebra" has to do with the fact that you cannot evolve a protocol without strictly publishing the changed spec, that is you're strictly obliged to publish a spec, even the loosely defined one, with lots of omissions and zero-values. That's the inferior algebra for protobuf, but you can think it is unnecessary and doesn't exist.

tptacek 9/6/2025||

Instead of just handwaving about whether it's necessary or not, why not point to any protocol that relies on that attribute, and we can then evaluate how important that protocol is?

porridgeraisin 9/6/2025|||

Yeah. And for anyone curious about the actual content hidden under the jargon-kludge-FP-nerd parent comment, here's my attempt at deciphering it.

They seem to be saying that you have to publish code that can change a type from schema A to schema B... And back, whenever you make a schema B. This is the "algebra". The "and back" part makes it bijective. You do this at the level of your core primitive types so that it's reused everywhere. This is what they meant by "pervasive" and it ties into the whole symmetric groups thing.

Finally, it seems like when you're making a lossy change, where a bijection isn't possible, they want you to make it incompatible. i.e, if you replaced address with city, then you cannot decode the message in code that expects address.

summerlight 9/6/2025||||

> since required fields don't contribute to b/w incompatibility at all, as every real-world protocol has a mandatory required version number that's tied to a direct parsing strategy with strictly defined algebra

At least I know 10 different tech companies with billion dollars revenue which does not suit to your description. This comment makes me wonder if you have any experience of working on real world distributed systems. Oh and I'm pretty sure you did not read Kenton's comment; he already precisely addressed your point:

> This is especially true when it comes to protocols, because in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse.

I recommend you to do your homework before making such a strong argument. Reading a 5 mins long comment is not that hard. You can avoid lots of shame by doing so.

b_e_n_t_o_n 9/6/2025|||

Is this satire?

xyzzyz 9/6/2025||

Granted, on paper it’s a cool feature. But I’ve never once seen an application that will actually preserve that property.

Chances are, the author literally used software that does it as he wrote these words. This feature is critical to how Chrome Sync works. You wouldn’t want to lose synced state if you use an older browser version on another device that doesn’t recognize the unknown fields and silently drops them. This is so important that at some point Chrome literally forked protobuf library so that unknown fields are preserved even if you are using protobuf lite mode.

xmddmx 9/5/2025||

I share the author's sentiment. I hate these things.

True story: trying to reverse engineer macOS Photos.app sqlite database format to extract human-readable location data from an image.

I eventually figured it out, but it was:

A base64 encoded Binary Plist format with one field containing a ProtoBuffer which contained another protobuffer which contained a unicode string which contained improperly encoded data (for example, U+2013 EN DASH was encoded as \342\200\223)

This could have been a simple JSON string.

tgma 9/5/2025||

> This could have been a simple JSON string.

There's nothing "simple" about parsing JSON as a serialization format.

Zambyte 9/6/2025|||

Having attempted writing a JSON parser from scratch and a protobuf parser from scratch and only completing one of them, I disagree.

wvenable 9/5/2025|||

Except that most often you can just look at it and figure it out.

tgma 9/5/2025|||

Sure you can look at it[1], but you're not expected to look at Apple Photos database. The computer is.

Write a correct JSON parser, compare with protobuf on various metrics, and then we can talk.

[1]: although to be fair, I am older than kids whose first programming language was JavaScript, so I do not think of JSON object format with property names in quotes and integers that need to be wrapped as strings to be safe, etc., lack of comma after the last entry--to be fair this last one is a problem in writing, not reading JSON--as the most natural thing

wvenable 9/5/2025||

I'm also "older" but I don't think that means anything.

> Sure you can look at it[1], but you're not expected to look at Apple Photos database.

How else are you supposed to figure it out? If you're older then you know that you can't rely on the existence or correctness of documentation. Being able to look at JSON and understand it as a human on the wire is huge advantage. JSON being pretty simple in structure is as advantage. I don't see a problem with quoting property names! As for large integers and datetimes, yes that could be much better designed. But that's true of every protocol and file format that has any success.

JSON parsers and writers are common and plentiful and are far less crazy than any complete XML parser/writer library.

tgma 9/6/2025||

> Being able to look at JSON and understand it as a human on the wire is huge advantage

I don’t think this is a given at all. Depends on the context. I think it’s often overvalued. A lot of times the performance matters more. If human readability was the only thing that mattered, I would still not count JSON as the winner. You will have to pipe it to jq, realistically. You’d do the same for any other serialization format too. Inside Google where proto is prevalent, that is just as easy if not more convenient.

The point is how hard or easy it is for an app’s end user to decipher its file database is not a design goal for the serialization library chosen by Apple Photos developers here. The constraints and requirements are all on different axis.

IshKebab 9/5/2025|||

Sure but unless you want to embed an LLM in every JSON library, computers can't do that.

bobbylarrybobby 9/5/2025|||

https://github.com/RhetTbull/osxphotos

fluoridation 9/5/2025|||

I mean... you can nest-encode stuff in any serial format. You're not describing a problem either intrinsic or unique to Protobuf, you're just seeing the development org chart manifested into a data structure.

xmddmx 9/5/2025||

Good points this wasn't entirely a protobuf-specific issue, so much as it was a (likely hierarchical and historical set of) bad decisions to use it at all.

Using Protobuffers for a few KB of metadata, when the photo library otherwise is taking multiple GB of data, is just pennywise pound foolish.

Of course, even my preference for a simple JSON string would be problematic: data in a database really should be stored properly normalized to a separate table and fields.

My guess is that protobuffers did play a role here in causing this poor design. I imagine this scenario:

- Photos.app wants to look up location data

- the server returns structured data in a ProtoBuffer

- there's no easy or reasonable way to map a protobuf to database fields (one point of TFA)

- Surrender! just store the binary blob in SQLITE and let the next poor sod deal with it

tgma 9/5/2025||

You have to take into account the fact that iPhoto app has had many iterations. The binary plist stuff is very likely the native NSArchive "object archiving (serialization)" that is done by Obj-C libraries. They probably started using protobuf at some point later after iCloud. I suspect the unicode crap you are facing may even predate Cocoaization of the app (they probably used Carbon API).

So it would make it a set of historical decisions, but I am not convinced they are necessarily bad decisions given the constraints. Each layer is likely responsible for handing edge cases in the application that you and I are not privy to.

pjjpo 9/6/2025|||

The JSON version would have also had the wrong encoding - all formats are just a framing for data fed in from code written by a human. In mac's case, em dash will always be an issue because that's just what Mac decided on intentionally.

seanw444 9/5/2025|||

That's horrendous. For some reason I imagine Apple's software to be much cleaner, but I guess that's just the marketing getting to my head. Under the hood it's still the same spaghetti.

ninkendo 9/6/2025||

Yeah, the problem is Apple and all the other contemporary tech companies have engineers bounce around between them all the time, and they take their habits with them.

At some point there becomes a critical mass of xooglers in an org, and when a new use case happens no one bothers to ask “how is serialization typically done in Apple frameworks”, they just go with what they know. And then you get protobuf serialization inside a plist. (A plist being the vanilla “normal” serialization format at Apple. Protobuf inside a plist is a sign that somebody was shoehorning what they’re comfortable with into the code.)

05 9/6/2025||

It that's any consolation, in the current version's schema they are just plain ZLATITUDE FLOAT, ZLONGITUDE FLOAT in ZASSET table..

xg15 9/6/2025||

I'm starting to wonder if some of those bad design decisions are symptoms of a larger "cultural bias" at Google. Specifically the "No Compositionality" point: It reminds me of similar bad designs in Go, CSS and the web platform at large.

The pattern seems to be that generalized, user-composable solutions are discouraged in favor of a myriad of special constructs that satisfy whatever concrete use cases seem relevant for the designers in the moment.

This works for a while and reduces the complexity of the language upfront, while delivering results - but over time, the designs devolve into a rats's nest of hyperspecific design features with awkward and unintuitive restrictions.

Eventually, the designers might give up and add more general constructs to the language - but those feel tacked on and have to coexist with specific features that can't be removed anymore.

senorrib 9/6/2025||

It works both ways. General constructs tend to become overly abstract and you end up with sneaky errors in different places due to a minor change to an abstraction.

Like the old adage, this is just a matter of preference. Good software engineering requires, first and foremost, great discipline, regardless of the path or tool you choose.

gettingoverit 9/6/2025||

If there are errors in implementation of general constructs, they tend to be visible at their every use, and get rapidly fixed.

Some general constructs are better than the others, because they have an algebraic theory behind them, and sometimes that theory was already researched for a few hundred years.

For example, product/coproduct types mentioned in the article are quite close to addition and multiplication that we've all learned in school, and obey the same laws.

So there are several levels where the choice of ad-hoc constructs is wrong, and in the end the only valid reason to choose them is time constraints.

If they had 24 years to figure out how to do it properly, but they didn't, the technology is just dead.

sdenton4 9/6/2025||

Hm, that's idealistic...

I've certainly run into cases where small changes in general systems led to hard-to-detect bugs, which took a great deal of investigation to figure out. Not all failures are catastrophic.

The technology is quite alive, which is why it hasn't been 'fixed' - changing the wheels on a moving car, and all that. The actual disappointment is that a better alternative hasn't taken off in the six years since this post was written... If its so easy, where's the alternatives?

gettingoverit 4 days ago||

That's not idealistic, that's how arithmetics work. If you use the same generic thing more times, you have the higher chance of discovering it broken. The fact that you've run into cases means that chance is never zero, and is irrelevant to the discussion.

As was already mentioned in the article, PB solve a problem that likely only Google has, even if that. State of the art nowadays is JSON/JSONL. If it grows too large, gzip it.

When someone is using third-party closed proprietary technologies to be "not like the rest", it usually doesn't work that well for their business.

The technology is "alive" until it didn't follow the path of Closure, GWT, and the rest of "we use it on the most loaded page of the world" technology. PB will be on the same graveyard soon.

lelanthran 7 days ago||

> This works for a while and reduces the complexity of the language upfront, while delivering results - but over time, the designs devolve into a rats's nest of hyperspecific design features with awkward and unintuitive restrictions.

But that's true for almost anything, though.

jsnell 9/5/2025||

Discussed many times over the years:

https://news.ycombinator.com/item?id=18188519 (299 comments)

https://news.ycombinator.com/item?id=21871514 (215 comments)

https://news.ycombinator.com/item?id=35281561 (59 comments)

tptacek 9/5/2025|

There are a lot of great comments on these old threads, and I don't think there's a lot of new science in this field since 2018, so the old threads might be a better read than today's.

Here's a fun one:

https://news.ycombinator.com/item?id=21873926

bithive123 9/5/2025||

I don't know if the author is right or wrong; I've never dealt with protobufs professionally. But I recently implemented them for a hobby project and it was kind of a game-changer.

At some stage with every ESP or Arduino project, I want to send and receive data, i.e. telemetry and control messages. A lot of people use ad-hoc protocols or HTTP/JSON, but I decided to try the nanopb library. I ended up with a relatively neat solution that just uses UDP packets. For my purposes a single packet has plenty of space, and I can easily extend this approach in the future. I know I'm not the first person to do this but I'll probably keep using protobufs until something better comes along, because the ecosystem exists and I can focus on the stuff I consider to be fun.

tliltocatl 9/6/2025||

Embedded/constrained UDP is where protobuf wire format (but not google's libraries) rocks: IoT over cellular and such, where you need to fit everything into a single datagram (number of roundtrips is what determines power consumption). As to those who say "UDP is unreliable" - what you do is you implement ARQ on the application level. Just like TCP does it, except you don't have to waste roundtrips on SYN-SYN-ACK handshake nor waste bytes on sending data that are no longer relevant.

Varints for the win. Send time series as columns of varint arrays - delta or RLL compression becomes quite straightforward. And as a bonus I can just implement new fields in the device and deploy right away - the server-side support can wait until we actually need it.

No, flatbuffers/cap'n'proto are unacceptably big because of fixed layout. No, CBOR is an absolute no go - why on earth would you waste precious bytes on schema every time? No, general-purpose compression like gzip wouldn't do much on such a small size, it will probably make things worse. Yes, ASN is supposed to be the right solution - but there is no full-featured implementation that doesn't cost $$$$ and the whole thing is just too damn bloated.

Kinda fun that it sucks for what it is supposed to do, but actually shines elsewhere.

henningpeters 9/6/2025|||

> why on earth would you waste precious bytes on schema every time

cbor doesn't prescribe sending schema, in fact there is no schema, like json.

i just switched from protobuf to cbor because i needed better streaming support and find use it quite delightful. losing protobuf schema hurts a bit, but the amount of boilerplate code is actually less than what i had before with nanopb (embedded context). on top of it, i am saving approx. 20% in message size compared to protobuf bc i am using mostly arrays with fixed position parameters.

tliltocatl 9/6/2025||

> cbor doesn't prescribe sending schema, in fact there is no schema, like json.

You are right, I must have confused CBOR with BSON where you send field names as strings.

>on top of it, i am saving approx. 20% in message size compared to protobuf bc i am using mostly arrays with fixed position parameters

Arrays with fixed position is always going to be the most compact format, but that means that you essentially give up on serialization. Also, when you have a large structure (e. g. full set of device state and settings)where most of the fields only change infrequently, it makes sense to only send what's changed, and then TLV is significantly better.

cryptonector 9/6/2025||||

> Yes, ASN is supposed to be the right solution - but there is no full-featured implementation that doesn't cost $$$$ and the whole thing is just too damn bloated.

Oh for crying out loud! PB had ZERO tooling available when it was created! It would have been much easier to create ASN.1 tooling w/ OER/PER and for some suitable subset of ASN.1 in 2001 that it was to a) create an IDL, b) create an encoding, and c) write tooling for N programming languages.

In fact, one thing one could have done is write a transpiler from the IDL to an AST that does all linting, analysis, and linking, and which one can then use to drive codegen for N languages. Or even better: have the transpiler produce a byte-coded representation of the modules and then for each programming language you only need to codegen the types but not the codecs -- instead for each language you need only write the interpreter for the byte-coded modules. I know because I've extended and maintained an [open source] ASN.1 compiler that fucking does [some of] these things.

Stop spreading this idea that ASN.1 is bloated. It's not. You can cut it down for your purposes. There's only 4 specifications for the language itself, of which the base one (x.680) is enough for almost everything (the others, X.681, X.682, and X.683, are mainly for parameterized types and formal typed hole specifications [the ASN.1 "information object system], which are awesome but you can live without). And these are some of the best-written and most-readable specifications ever written by any standards development organization -- they are a great gift from a few to all of mankind.

tliltocatl 9/6/2025||

> It would have been much easier to create ASN.1 tooling w/ OER/PER and for some suitable subset of ASN.1 in 2001

Just by looking at your past comments - I agree that if google reused ASN.1, we would have lived in a better world. But the sad reality now is that PB gots tons of FOSS tooling and ASN.1 barely any (is there any free embedded-grade implementation other than asn1cc?) and figuring out what features you can use without having to pledge your kidney and soul to Nokalva is a bit hard.

I tried playing with ASN.1 before settling on protobuf. Don't recall which compiler I used, but immediately figured out that apparently datetime datatype is not supported, and the generated C code was bloated mess (so is google's protobuf - but not nanopb). Protobuf, on the other hand, was quite straightforward on what is and is not supported. So us mortals who aren't google and have a hard time justifying writing serdes from scratch gotta use what's available.

> Stop spreading this idea that ASN.1 is bloated

"Bloated" might be the wrong word - but it is large and it's damn hard for someone designing a new application to figure out which part is safe to use, because most sources focus on using it for decoding existing protocols.

cryptonector 9/7/2025||

For sure PB is a fact of life now. A regrettable fact of life, but perhaps a lesson (that few will heed).

grogers 9/6/2025|||

Other than ASN.1 PER, is there any other widely used encoding format that isn't self-describing? Using TLV certainly adds flexibility around schema evolution, but I feel like collectively we are wasting a fair amount of bytes because of it...

tliltocatl 7 days ago|||

Cap'n'proto doesn't have tags, but it wastes even more bytes in favor of speed. Than again, omitting tags only saves space if you are sending all the fields every time. PER uses a bitmap, which is still a bit wasteful on large sparse structs.

cryptonector 7 days ago||

PER sends a bitmap only of OPTIONAL members' (fields') presence/absence. Required members are just where you expect them: right after their preceding members.

cryptonector 7 days ago||||

Also JSOON and XML are not TLV, though of course they're not really good examples of non-TLV encodings -- certainly they can't be what you had in mind.

cryptonector 7 days ago|||

OER (related to PER)

XDR (ONC RPC, NFS)

MS RPC (DCE RPC w/ tweaks)

Flat Buffers

akazantsev 9/6/2025|||

You can also send JSON over UDP. Wiz smart bulbs do this for communication.

https://github.com/sbidy/pywizlight?tab=readme-ov-file#examp...

_zoltan_ 9/5/2025||

and since it's UDP, if it's lost it's lost. and since it's not standard http/JSON, nobody will have a clue in a year and can't decode it.

to learn and play with it it's fine, else why complicate life?

Farmadupe 9/5/2025||

Using protobuf is practical enough in embedded. This person isn't the first and won't be the last. Way faster than JSON, way slower than C structs.

However protobuf is ridiculously interchangeable and there are serializers for every language. So you can get your interfaces fleshed out early in a project without having to worry that someone will have a hard time ingesting it later on.

Yes it's a pain how an empty array is a valid instance of every message type, but at least the fields that you remember to send are strongly typed. And field optionality gives you a fighting chance that your software can still speak to the unit that hasn't been updated in the field for the last five years.

On the embedded side, nanopb has worked well for us. I'm not missing having to hand maintain ad-hoc command parsers on the embedded side, nor working around quirks and bugs of those parsers on the desktop side

ndr 9/5/2025||

Not even before the first line ends you get "They’re clearly written by amateurs".

This is a rage bait, not worth the read.

btilly 9/5/2025||

The reasons for that line get at a fundamental tension. As David Wheeler famously said, "All problems in computer science can be solved by another level of indirection, except for the problem of too many indirections."

Over time we accumulate cleverer and cleverer abstractions. And any abstraction that we've internalized, we stop seeing. It just becomes how we want to do things, and we have no sense of what cost we are imposing with others. Because all abstractions leak. And all abstractions pose a barrier for the maintenance programmer.

All of which leads to the problem that Brian Kernighan warned about with, "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?" Except that the person who will have to debug it is probably a maintenance programmer who doesn't know your abstractions.

One of the key pieces of wisdom that show through Google's approaches is that our industry's tendency towards abstraction is toxic. As much as any particular abstraction is powerful, allowing too many becomes its own problem. This is why, for example, Go was designed to strongly discourage over-abstraction.

Protobufs do exactly what it says on the tin. As long as you are using them in the straightforward way which they are intended for, they work great. All of his complaints boil down to, "I tried to do some meta-manipulation to generate new abstractions, and the design said I couldn't."

That isn't the result of them being written by amateurs. That's the result of them being written to incorporate a piece of engineering wisdom that most programmers think that they are smart enough to ignore. (My past self was definitely one of those programmers.)

Can the technology be abused? Do people do stupid things with them? Are there things that you might want to do that you can't? Absolutely. But if you KISS, they work great. And the more you keep it simple, the better they work. I consider that an incentive towards creating better engineered designs.

b_e_n_t_o_n 9/6/2025||

I think you nailed it. So many complaints about Go for example basically come down to "it didn't let me create X abstraction" and that's basically the point.

jilles 9/5/2025|||

The best way to get your point across is by starting with ad-hominem attacks to assert your superior intelligence.

instig007 9/6/2025|||

Yeah, let's pretend that type algebra doesn't exist, and even if it does exist then it's not useful and definitely isn't practical in data protocols. Let's believe that the authors of protobuf considered everything, and since they aren't amateurs (by the virtue of having worked on protobuf at Google, presumably), every elaborated opinion that draws them as amateurs at applying type algebra in data protocol designs is a personal ad-hominem attack.

tptacek 9/6/2025||

They're not amateurs by virtue of being some of the most senior engineers ever to work at Google. You don't get to play the "ad hominem" card while calling them names. This whole thread is embarrassing.

instig007 7 days ago|||

Ok, "some of the most senior engineers ever to work at Google" don't seem to know that static bounds checking don't require dependent types: https://news.ycombinator.com/item?id=45150008

> You don't get to play the "ad hominem" card while calling them names

The entire article explains it at length why there's the impression, it's not ad-hominem.

tptacek 7 days ago||

Previous threads on this story have spelled out specifically which Googlers were behind this design, and, again, it's embarrassing that anybody is trying to defend the hill of "protobuf's designers were amateurs". You can keep digging in if you want.

b_e_n_t_o_n 9/6/2025|||

[flagged]

tshaddox 9/5/2025||||

IMO it's a pretty reasonable claim about experience level, not intelligence, and isn't at all an ad hominem attack because it's referring directly to the fundamental design choices of protocol buffers and thus is not at all a fallacy of irrelevance.

compiler-guy 9/6/2025||

Whatever else Jeff Dean and Sanjay Ghemawat are, and whatever mistakes they made in designing protobufs, they are not amateurs.

Not long after they designed and implemented protobuffers, they shared the ACM prize in computing, as well as many other similar honors. And the honors keep stacking up.

None of this means that protobufs are perfect (or even good), but it does mean they weren't amateurs when they did it.

https://en.wikipedia.org/wiki/Jeff_Dean

https://en.wikipedia.org/wiki/Sanjay_Ghemawat

notmyjob 9/5/2025||||

I disagree, unless you are in the majority.

perching_aix 9/5/2025|||

Is this in reference to the blogpost, the comment above, or your own comment? Cause it honestly works for all of them.

sieabahlpark 9/5/2025||

[dead]

BugsJustFindMe 9/5/2025|||

If only the article offered both detailed analyses of the problems and also solutions. Wait, it does! You should try reading it.

pphysch 9/5/2025|||

Where's the download link for the solution? I must have missed it.

kiitos 9/5/2025|||

it does not

jeffbee 9/5/2025|||

Yep, the article opens with a Hall of Fame-grade compound fallacy: a strawman refutation of a hypothetical ad hominem that nobody has argued.

You can kinda see how this author got bounced out of several major tech firms in one year or less, each, according to their linkedin.

omnicognate 9/5/2025||

It's a terrible attitude and I agree that sort of thing shouldn't be (and generally isn't) tolerated for long in a professional environment.

That said the article is full of technical detail and voices several serious shortcomings of protobuf that I've encountered myself, along with suggestions as to how it could be done better. It's a shame it comes packaged with unwarranted personal attacks.

TZubiri 9/6/2025|||

> if (m_foo = null)

Imagine calling google amateurs, and then the only code you write has a first year student error in failing to distinguish assignment from comparision operator.

There's a class of rant on the internet where programmers complain about increasingly foundational tech instead of admitting skill issues. If you go far deep into that hole, you end up rewriting the kernel in Rust.

awalsh128 9/6/2025|||

Yeah, there is a lot of snark in the article which undermines their argument.

IncreasePosts 9/5/2025||

It's written by amateurs, but solves problems that only Google(one of the biggest/most advanced tech companies in the world) has.

barrkel 9/5/2025||

I'm afraid that this is a case of someone imagining that there are Platonic ideal concepts that don't evolve over time, that programs are perfectible. But people are not immortal and everything is always changing.

I almost burst out in laughter when the article argued that you should reuse types in preference to inlining definitions. If you've ever felt the pain of needing to split something up, you would not be so eager to reuse. In a codebase with a single process, it's pretty trivial to refactor to split things apart; you can make one CL and be done. In a system with persistence and distribution, it's a lot more awkward.

That whole meaning of data vs representation thing. There's fundamentally a truth in the correspondence. As a program evolves, its understanding of its domain increases, and the fidelity of its internal representations increase too, by becoming more specific, more differentiated, more nuanced. But the old data doesn't go away. You don't get to fill in detail for data that was gathered in older times. Sometimes, the referents don't even exist any more. Everything is optional; what was one field may become two fields in the future, with split responsibilities, increased fidelity to the domain.

iamdelirium 9/5/2025|

Yeah, oneOf fields can be repeated but you can just wrap them in a message. It's not as pretty but I've never had any issues with this.

The fact that the author is arguing for making all messages required means they don't understand the reasoning for why all fields are optional. This breaks systems (there are are postmortems outlining this) then there are proto mismatches .

More comments...