Posted by jesseduffield 12/26/2025
It might be a good bet to bet on text, but it feels inefficient a lot of the time, especially in cases like this where all sorts of files are stored in JSON documents.
1: https://gist.github.com/simonw/007c628ceb84d0da0795b57af7b74...
2: https://simonwillison.net/2025/Dec/26/slop-acts-of-kindness/
PS: 2014
A computer program would process this graph to schedule the courses so there are no conflicts or few conflicts. In other words, it would try to satisfy as many students as possible. This is in contrast with the term "graph" that one saw in high school or junior high school; that represents a function. An example would be the line chart where the height of a child is on the y-axis with their age on their x-axis and a point representing each time their height was taken and with lines connecting one height data point to another.
All data structures can be represented as graphs. For example, a hypergraph can be represented as a graph where each hyperedge corresponds to a node and connected to the nodes to which the hyperedge. Objects in a mechanical engineering CAD system or graphics display system are often kept as the winged-edge configuration. That is, we know for each edge, the adjacent faces and for each faces, the edges. Thus, the face is a "hyperedge" with the edges in the diagram being the nodes. [3] [3] https://en.wikipedia.org/wiki/Winged_edge Stanford Technical Report STAN CS 320 Bruce G. Baumgart, "Winged Edge Polyhedron Representation" http://i.stanford.edu/pub/cstr/reports/cs/tr/72/320/CS-TR-72... Charles Eastman and Kevin Walter Geometric Modeling Using the Euler Operators , Carnegie Mellon University DRC 15-279, May 1979
Of course, any graph can be serialized. Often, that would be done in JSON or XML. ChatGPT tells me that the time to serialize a graph is O(V+E) for adjacency lists and O(V^2) for adjacency matrices. That is, any data structure represented as a collection of pointers can be converted into text in time linear to the amount of information in the data structure. Adjacency matrices are used when we want to quickly see whether one entity is connected to another; but it is at the cost of space and time to serialize.
Assume one is tracking which students are taking (or are interested in taking) which course. In the computer, the programmer can put this into a rectangular array of size CS where C is the number of courses and S is the number of students. When dumped into text naively into text, this would take space and time writing to disk proportional to CS. On the other hand, assume that this is a sparse array; on average, each student is only interested in taking 10 courses. We can represent this as a list of average size 10 for each student, or time 10S. (Or more precisely, 10S+C.) We call in computer science, the relation between students and courses as a many-many relationship.
See also: [4] https://stackoverflow.com/questions/51783/how-to-serialize-a...
That is the power of sparsity, it reduces the time from a product to a linear function. (The classic graph is a many-many relation of something to itself. That is, which island is connected to another island by a bridge, which course is connected to another one by a student interested in both, or which city is connected to another city by a direct flight.) The average number of connections for each entity to another entity is the sparsity, m. Thus, the time to write the data for a sparse representation is represented by mN where N is the number of entities (or nodes).
By a little verbal sleight of hand, we say that the many-many relationship of students courses is a graph where some nodes are labled "student" and others are represented "course."
Throughout the above discussions, I ignore the constant which is the time to write one connection to the file; in this discussion, I ignore it in most of the discussion for simplicity. Similarly, with space, there is a proportionality constant--how many bits or bytes does it take to record one student-course connection or one bridge in the island-graph example.
As an aside not relevant to my discussion but relevant to the entire discussion, I just saw a news article on storing JSON on binary. https://devclass.com/2024/01/16/sqlites-new-support-for-bina...
I completely disagree, if LLMs have taught us anything it's that the semantic space is MASSIVE and has far too many dimensions to visualize. Of course for some specific situations visualizations are great and can give you almost immediate insight, but for truly complex problems the only ability we have as humans that lets humans understand complex relationships is language
Now language can be visual, textual or auditory. But at the end of the day it must be a language. Music notation isn't a language, it's a very simple set of semantics splayed out in a standard way, when people try to increase the semantic density it turns comical, also there is very little contextual relationship between the semantic markings (key affects notes and ties affect notes but key never affects ties). Whereas a programming language can have entire scores a single identifier. Many people have a shared, somewhat lossy understanding of unreal whether they worked with it, played a game with it or whatever, one that can include a lot more than just the code.
Excerpts where he explains: "Now this was technically a fault in the application (Word 6.0 for the Macintosh) not the operating system (MacOS 7 point something) and so the initial target of my annoyance was the people who were responsible for Word. But. On the other hand, I could have chosen the "save as text" option in Word and saved all of my documents as simple telegrams, and this problem would not have arisen. Instead I had allowed myself to be seduced by all of those flashy formatting options that hadn't even existed until GUIs had come along to make them practicable. I had gotten into the habit of using them to make my documents look pretty (perhaps prettier than they deserved to look; all of the old documents on those floppies turned out to be more or less crap). Now I was paying the price for that self-indulgence. Technology had moved on and found ways to make my documents look even prettier, and the consequence of it was that all old ugly documents had ceased to exist."
and
"When my Powerbook broke my heart, and when Word stopped recognizing my old files, I jumped to Unix. The obvious alternative to MacOS would have been Windows. I didn't really have anything against Microsoft, or Windows. But it was pretty obvious, now, that old PC operating systems were overreaching, and showing the strain, and, perhaps, were best avoided until they had learned to walk and chew gum at the same time.
The changeover took place on a particular day in the summer of 1995. I had been San Francisco for a couple of weeks, using my PowerBook to work on a document. The document was too big to fit onto a single floppy, and so I hadn't made a backup since leaving home. The PowerBook crashed and wiped out the entire file.
It happened just as I was on my way out the door to visit a company called Electric Communities, which in those days was in Los Altos. I took my PowerBook with me. My friends at Electric Communities were Mac users who had all sorts of utility software for unerasing files and recovering from disk crashes, and I was certain I could get most of the file back.
As it turned out, two different Mac crash recovery utilities were unable to find any trace that my file had ever existed. It was completely and systematically wiped out. We went through that hard disk block by block and found disjointed fragments of countless old, discarded, forgotten files, but none of what I wanted. The metaphor shear was especially brutal that day. It was sort of like watching the girl you've been in love with for ten years get killed in a car wreck, and then attending her autopsy, and learning that underneath the clothes and makeup she was just flesh and blood."