Posted by todsacerdoti 6/30/2025
Plus an article about C was at the top of hacker news all day today.
C is still relevant, but TIOBE numbers are not.
The #1 data structure in any program is array.
Which generic operations does the language provide?
You can generically copy arrays [1], index them, iterate them, compute their size. But that's not the point, the point is that you can instantiate them for any (complete) T.
[1] well, C is weird, so you first need to wrap them in a struct.
There are quite a few problems that specialised containers are suited for, that's why they were created.
The situation where you need a red black tree with 10 different key/value combos isn’t real.
Otherwise, that seems unwise to me. Not every user of a generic type has to be generic. A major selling point of generic types is that you write a library once, then everyone can instantiate it. Even if that is the only instance they need in their use case, you have saved them the trouble of reinventing the wheel.
No colleague of mine may need 10 different instances of any of my generic libraries, but I bet that all of them combined do, and that our bosses are happy that we don't have to debug and maintain 10+ different implementations.
Experienced programmers know when to reuse a generic library and when to roll out their own. We all know that.
Yet you dismiss generic red black trees because there is no realistic single program that uses 10 key/value combinations. Not only is this false, but as I illustrated in my reply, a single program is not the relevant scope to decide whether to use a generic implementation. And as someone who has written a red black tree library, I am definitely in favor of reusing an implementation unless you have an excellent reason, which "I do not need 10 different instances in my program" or "my favorite language and its standard library only have arrays built in" most definitely are not.
I use generic data structures all the time. That’s the default in programming. We know their advantages.
I am trying to say there is a world out there that doesn’t look like ours and it works just fine and has other advantages.
Are you saying the only real way to program is with generic data structures? I’m saying no because nobody did prior to the late 80s.
> I am definitely in favor of reusing an implementation unless you have an excellent reason
Let me try one more time. If you examine a program you won’t find 10 different red black trees. Data tends to follow a particular path and there is probably 1 red black tree and it’s a core feature of the application.
If that’s true then it’s not a big deal to write that core feature of your application.
It avoids premature optimization and code bloat because you tend to use complex data structures when they have a need.
Array is a good default. It’s how computer memory works, not just what happens to be lying around.
Certainly not. As I said, the experienced programmer knows when (not) to use them. Some programs are better off without them, such as... most of the low-level software I write (security-critical operating systems).
> Data tends to follow a particular path and there is probably 1 red black tree and it’s a core feature of the application. If that’s true then it’s not a big deal to write that core feature of your application.
"It's not a big deal" are famous last words. Maybe it is not a big deal, but unless you have hard constraints or measurements that confirm that you would not be served well enough by an existing solution, it may very well be the reason why your competitors make progress while you are busy reinventing, debugging, maintaining, and obtaining certification for a wheel whose impact you overestimated.
> It avoids premature optimization and code bloat because you tend to use complex data structures when they have a need.
Avoiding code bloat? Agreed, a custom solution cannot be worse than a generic one. Avoiding premature optimization? Quite the contrary: going straight for the custom solution without first confirming, with actual measurements, that the generic solution is the unacceptable bottleneck, is textbook premature optimization.
I am sorry but I do not understand what you are getting at.
No. There are a few fundamental ones which work well in a generic context. When you tailor make it you find simplifying assumptions.
One I am aware of is a hash map that doesn’t need to delete individual keys.
Somehow dissemination of Knuth v3,chapter6.4 Algorithm D has been weak, though it has lately become known as "backshift deletion" - unsure who coined that. It is limited to linear probing (but then these days that also creates the least memory traffic/potential latency). With this approach, there is no real specialized form of "hash tables that can delete". You may already know all of this and not disagreeing with anything. Just a natural follow-up.
Isn't resizing logic already a counterexample?
If you _do_ know how many entries you're going to be adding upfront, you probably don't even want a hash table, you're probably better off with an array!
> I'd be much more worried about a home grown
This is rhetorical framing.
As a knuth reader you should know then when you look inside the box you find some other idiot’s home grown thing who never read the research.
I don't know what would count as "major", but at least to me "major" does not imply "good". As I mentioned, this idea is ancient (the point of citing Knuth from over half a century ago - that was me, not @dwattttt) but also weakly disseminated. In said weakness, it may well be uncommon in "major" impls, but that merely recapitulates my point. Honestly, that and the resizing thing was such weird argumentation that I didn't take the time to reply.
One does usually resize on insert to keep things sparse as @dwattttt mentioned. With deletions, might want to resize after delete to keep things dense which could help if you iterate over the table much after deleting a lot of it. That is not a cost you would ever pay if you don't delete. So, ability to delete is still something insert-only workloads don't pay anything for.
Moving past the off-point qualification of "major"-ness & weird resize points, if you are actually curious about more details how this can work and want a concrete example, you could look at the Nim stdlib Table implementation: https://github.com/nim-lang/Nim/blob/fbdc9a4c19aafc25937aaa5... or you could read the Knuth subsection that I referenced, but Knuth uses a rather archaic linear probing to smaller indices which probably seems unnatural to modern readers. There is nothing special added to the insert-only data/pathways to support deletes either in that Nim code or the Knuth.
Deletes done this way can benefit from saving integer hash codes (and that happens to be done in the Nim example), but so do all finds via prefix compare & resizes by avoiding hash(). So, in the space-time trade-off analysis of "is saving hash codes worth it?", delete finds & backshifts are just one more part of some workload. This is "deletes in a workload impact space-time trade-offs", though, not "ability to delete costs insert-only workloads" which seems to be the contentious point.