I write type-safe generic data structures in C

Posted by todsacerdoti 6/30/2025

I write type-safe generic data structures in C(danielchasehooper.com)

412 points | 182 commentspage 4

b0a04gl 6/30/2025|

[dead]

ValveFan6969 6/30/2025||

[dead]

luppy47474 6/30/2025||

[flagged]

luppy47474 6/30/2025||

[flagged]

adamnemecek 7/1/2025||

Dude the ship has sailed.

dhooper 7/1/2025|

Not sure what you mean by that, but if you're trying to imply that C is not relevant: https://www.tiobe.com/tiobe-index/

Plus an article about C was at the top of hacker news all day today.

adamnemecek 7/1/2025|||

Not C per se, but trying to make C type safe.

tialaramex 7/1/2025|||

TIOBE is bullshit. When somebody cites TIOBE what they mean is either "I have no idea what I'm talking about, but here's something authoritative seeming that agrees with me" or worse "I know this is bullshit and I don't care"

C is still relevant, but TIOBE numbers are not.

monkeyelite 6/30/2025|

Another way is to not try to write generic data structures. When you tailor them to the use case you can simplify.

The #1 data structure in any program is array.

gpderetta 7/1/2025||

The irony is that arrays in C are in fact generic. As that is the simplest generic solution that doesn't require boiling the ocean first, that's what many programmers reach for.

monkeyelite 7/2/2025|||

How do I write a C function which operates on array of type T?

Which generic operations does the language provide?

gpderetta 7/2/2025||

Arrays are generic, but of course C doen't provide generic functions.

You can generically copy arrays [1], index them, iterate them, compute their size. But that's not the point, the point is that you can instantiate them for any (complete) T.

[1] well, C is weird, so you first need to wrap them in a struct.

dwattttt 6/30/2025||

When all you have are arrays, everything looks like a problem you solve with arrays.

There are quite a few problems that specialised containers are suited for, that's why they were created.

monkeyelite 7/1/2025||

And you can write them when you need them.

The situation where you need a red black tree with 10 different key/value combos isn’t real.

el_pollo_diablo 7/1/2025|||

If, by "situation", you mean the development of a small program with so many constraints that using existing libraries is out if the question, then yes.

Otherwise, that seems unwise to me. Not every user of a generic type has to be generic. A major selling point of generic types is that you write a library once, then everyone can instantiate it. Even if that is the only instance they need in their use case, you have saved them the trouble of reinventing the wheel.

No colleague of mine may need 10 different instances of any of my generic libraries, but I bet that all of them combined do, and that our bosses are happy that we don't have to debug and maintain 10+ different implementations.

monkeyelite 7/1/2025||

Ok you’re telling me the upside which we already know. Now what’s the downside?

el_pollo_diablo 7/2/2025||

You mean the downside that we also already know, i.e. that there are some situations where a custom data structure would be superior for various reasons (e.g. smaller footprint)?

Experienced programmers know when to reuse a generic library and when to roll out their own. We all know that.

Yet you dismiss generic red black trees because there is no realistic single program that uses 10 key/value combinations. Not only is this false, but as I illustrated in my reply, a single program is not the relevant scope to decide whether to use a generic implementation. And as someone who has written a red black tree library, I am definitely in favor of reusing an implementation unless you have an excellent reason, which "I do not need 10 different instances in my program" or "my favorite language and its standard library only have arrays built in" most definitely are not.

monkeyelite 7/2/2025||

> Yet you dismiss generic red black trees

I use generic data structures all the time. That’s the default in programming. We know their advantages.

I am trying to say there is a world out there that doesn’t look like ours and it works just fine and has other advantages.

Are you saying the only real way to program is with generic data structures? I’m saying no because nobody did prior to the late 80s.

> I am definitely in favor of reusing an implementation unless you have an excellent reason

Let me try one more time. If you examine a program you won’t find 10 different red black trees. Data tends to follow a particular path and there is probably 1 red black tree and it’s a core feature of the application.

If that’s true then it’s not a big deal to write that core feature of your application.

It avoids premature optimization and code bloat because you tend to use complex data structures when they have a need.

Array is a good default. It’s how computer memory works, not just what happens to be lying around.

el_pollo_diablo 7/2/2025||

> Are you saying the only real way to program is with generic data structures?

Certainly not. As I said, the experienced programmer knows when (not) to use them. Some programs are better off without them, such as... most of the low-level software I write (security-critical operating systems).

> Data tends to follow a particular path and there is probably 1 red black tree and it’s a core feature of the application. If that’s true then it’s not a big deal to write that core feature of your application.

"It's not a big deal" are famous last words. Maybe it is not a big deal, but unless you have hard constraints or measurements that confirm that you would not be served well enough by an existing solution, it may very well be the reason why your competitors make progress while you are busy reinventing, debugging, maintaining, and obtaining certification for a wheel whose impact you overestimated.

> It avoids premature optimization and code bloat because you tend to use complex data structures when they have a need.

Avoiding code bloat? Agreed, a custom solution cannot be worse than a generic one. Avoiding premature optimization? Quite the contrary: going straight for the custom solution without first confirming, with actual measurements, that the generic solution is the unacceptable bottleneck, is textbook premature optimization.

I am sorry but I do not understand what you are getting at.

gpderetta 7/1/2025||||

If you can only write programs by banging rocks together, you will only produce programs that can be written by banging rocks together.

monkeyelite 7/2/2025||

And if you're used to writing programs with Dictionary<String, Dictionary<String, Array<String>>> you will do that too.

dwattttt 7/1/2025|||

You could take away anything you use and say "but we could make it ourselves", that doesn't mean it's helpful.

monkeyelite 7/1/2025||

Except it’s very common for C programs to contain one-off data structures, so it’s not a hypothetical. It’s a concrete programming style.

dwattttt 7/1/2025|||

Do you mean a data structure they only use once? Or one that's never been done elsewhere? If they only use it once, that seems like the worst effort/pay-off ratio you can get writing it yourself. And I don't think there's that many fundamental data structures out there... and even then, why would it be good to be forced to make your bespoke structure out of only arrays, when things like maps exist?

monkeyelite 7/1/2025||

> And I don't think there's that many fundamental data structures out there

No. There are a few fundamental ones which work well in a generic context. When you tailor make it you find simplifying assumptions.

One I am aware of is a hash map that doesn’t need to delete individual keys.

cb321 7/1/2025||

This is true of the common tombstone approach to deletions in hash tables { which also require rehashing (like in resizing) if there are too many tombstones }.

Somehow dissemination of Knuth v3,chapter6.4 Algorithm D has been weak, though it has lately become known as "backshift deletion" - unsure who coined that. It is limited to linear probing (but then these days that also creates the least memory traffic/potential latency). With this approach, there is no real specialized form of "hash tables that can delete". You may already know all of this and not disagreeing with anything. Just a natural follow-up.

monkeyelite 7/2/2025||

If we look at the major standard libraries for hash tables, you're telling me there will be no overhead to supporting the ability to delete keys?

Isn't resizing logic already a counterexample?

dwattttt 7/2/2025||

Resizing has to happen regardless of the ability to delete a key, unless you know upfront how many entries you're going to add; I'd be much more worried about a home grown hash table's tuning to rebalance itself at say, 80% buckets full.

If you _do_ know how many entries you're going to be adding upfront, you probably don't even want a hash table, you're probably better off with an array!

monkeyelite 7/2/2025||

You didn’t answer my question.

> I'd be much more worried about a home grown

This is rhetorical framing.

As a knuth reader you should know then when you look inside the box you find some other idiot’s home grown thing who never read the research.

cb321 7/2/2025||

> look at the major standard libraries for hash tables

I don't know what would count as "major", but at least to me "major" does not imply "good". As I mentioned, this idea is ancient (the point of citing Knuth from over half a century ago - that was me, not @dwattttt) but also weakly disseminated. In said weakness, it may well be uncommon in "major" impls, but that merely recapitulates my point. Honestly, that and the resizing thing was such weird argumentation that I didn't take the time to reply.

One does usually resize on insert to keep things sparse as @dwattttt mentioned. With deletions, might want to resize after delete to keep things dense which could help if you iterate over the table much after deleting a lot of it. That is not a cost you would ever pay if you don't delete. So, ability to delete is still something insert-only workloads don't pay anything for.

Moving past the off-point qualification of "major"-ness & weird resize points, if you are actually curious about more details how this can work and want a concrete example, you could look at the Nim stdlib Table implementation: https://github.com/nim-lang/Nim/blob/fbdc9a4c19aafc25937aaa5... or you could read the Knuth subsection that I referenced, but Knuth uses a rather archaic linear probing to smaller indices which probably seems unnatural to modern readers. There is nothing special added to the insert-only data/pathways to support deletes either in that Nim code or the Knuth.

Deletes done this way can benefit from saving integer hash codes (and that happens to be done in the Nim example), but so do all finds via prefix compare & resizes by avoiding hash(). So, in the space-time trade-off analysis of "is saving hash codes worth it?", delete finds & backshifts are just one more part of some workload. This is "deletes in a workload impact space-time trade-offs", though, not "ability to delete costs insert-only workloads" which seems to be the contentious point.

el_pollo_diablo 7/1/2025|||

Sure, but it is also very common for C programs to contain data structures that have one use in the program, and could still be instances of a generic type. You mentioned red black trees, which are a perfect example of that.