The Third Hard Problem

Posted by surprisetalk 2 days ago

44 points | 31 comments

efitz 55 minutes ago|

I have always called this the “one true taxonomy” problem, because whenever you sit with multiple stakeholders in a room talking about a taxonomy, you can never get to agreement, because there is no such thing as the “one true taxonomy”.

Any hierarchical taxonomy classifies on one dimension at each taxonomic level. Invariably someone wants to classify on one criteria when someone else wants to classify on another. Taxonomies that humans use aren’t multi-dimensional. So if there is a disagreement, someone wins and someone(s) has to lose.

No one is wrong; they just have different priorities or preferences or goals.

So now as an architect I never argue (and seldom discuss) taxonomies. I make two points and then bow out:

1. Whatever your taxonomy is, you need a rubric for each level. You need a procedure or set of questions that unambiguously map any $THING you encounter into exactly one bucket. Validate that competent people with no specific domain knowledge can properly classify things with your rubric; it must be repeatable by amateurs, not just experts (software is dumb).

2. Existence trumps theory. If there exists a taxonomy and rubric for what you’re classifying, you need to provide a $DARN_GOOD_REASON why this wheel needs reinventing. Personal preference and your 1% edge case probably don’t justify all the work to reinvent everything.

Then, I go back to the implementers and tell them to design in a tagging system, which is a DIY taxonomy, and except in ridiculous use cases, I can make indexes make it fast enough to let everyone overlay their own classification system.

gobdovan 1 hour ago||

I have a deep distrust of hierarchies, because they keep you trapped into a single model that keeps extending its authority, usuall without anyone explicitly deciding that it should do so. For example, the file system: once it was deemed hierarchy is the main metaphor for navigation, the structure persisted and was reused for organisation, ownership, access control and governance. And it became infrastructure we cannot easily remove before we could even question if it was right or not. And once it dominated, non-hierarchical things were retrofitted as glue, e.g.: symlinks, aliases, shortcuts... also, when's the last time you've used a tag?

The webs are so much more malleable, but they're also not free. All the 'good enough's you were that a hierarchy that was taking care of implicitly are now your responsibility to model precisely and make sure they're performant as well. Look at ReBAC, for example. It gives you expressive power, but it also forces you to reason precisely about relationships, graph traversal, consistency, and cost. Strikingly similar is GraphQL.

Interestingly, source code is hierarchical, but compiles almost immediately to a graph IR and most analysis and optimisations happen there. But almost nobody looks at a CFG/SSA graph directly. You author in a hierarchical manner, yet the operational substrate is a malleable graph.

speed_spread 11 minutes ago|

IMO hard links are underused in filesystems. You can have the same file / dir appear in different places under different names. Once linked, app doesn't have to care and runtime cost is zero.

gw32 15 minutes ago||

Well elucidated. This problem has irked me for years in the form of multiple inheritance. When it's disallowed (like Java, unfortunately), trying to reduce a directed graph structure to a single dominant hierarchy is quite the bothersome choice.

chaboud 3 hours ago||

The problem with trees is that the are a dimensional reduction, an aggregation; taking a problem without directionality and applying a useful/functional hierarchy.

And that's a problem because Aggregability is NP-Hard: https://dl.acm.org/doi/abs/10.1145/1165555.1165556

So a tree is a way to take a high dimensionality graph and make it usefully lower dimensionality, but, given the aforementioned proof, that reduction is going to go from being a lossless compression to a heuristic. So any interesting problem (at least, any problem interesting to me) is only going to be aided (read: not solved exhaustively) by that hierarchy.

I'm okay with this. Being okay with this has been one of the most freeing things over the last 20 years of my career. Accept inaccuracy, and find usefulness in your data structures.

iamwil 26 minutes ago||

I think I've always called this "Ontology is hard". It's genuinely useful when it's used as a tool for clarification. It's constraining when it's used as a tool for modeling.

et1337 4 hours ago||

I think all three problems are really one problem under the hood:

Are these two things actually the same thing, or they separate?

tikhonj 4 hours ago||

Reminds me of my favorite math essay: "When is one thing equal to some other thing?"

It's a great question, much deeper and more interesting than it seems. The essay suggests thinking in terms of isomorphisms (relative to the structure you care about) rather than equality in some absolute sense, and I've found a fuzzy version of that to be a really useful perspective even in areas that can't be fully formalized.

https://people.math.osu.edu/cogdell.1/6112-Mazur-www.pdf

hackthemack 2 hours ago|||

I jumped to a similar conclusion right away and popped over here to comment only to find you have beaten me to the punch. I use to keep a work wiki page of common problems the team encounters over and over again.

Years ago, I stumbled upon the "idea" was already debated in other fields long before programming. Lumpers and Splitters.

https://en.wikipedia.org/wiki/Lumpers_and_splitters

et1337 1 hour ago||

Wow, thanks for that, TIL! I’m definitely a code lumper.

hexasquid 2 hours ago|||

"Ambiguity is the enemy", as a rule of thumb, has helped me

aleksiy123 3 hours ago|||

Or non binary. How much are these the same and how.

tonetheman 3 hours ago||

[dead]

evmar 3 hours ago||

One nice tool for analyzing maps as a tree is as a dominator trees. I wrote a bit about it here: https://neugierig.org/software/blog/2023/07/dominator.html

js8 1 hour ago||

Every few years I watch, with amusement, our management restructuring the organizational hierarchy, allegedly because the old one didn't work.

mcphage 4 hours ago||

I thought the two hard problems were naming things, cache invalidation, and off-by-one errors?

rectang 3 hours ago||

At least the title “The Third Hard Problem” is still appropriate regardless of whether you get the joke right.

fragmede 4 hours ago||

Don't race forget conditions!

cheschire 3 hours ago||

His message was submitted before the memory recall completed execution.

More comments...