Top
Best
New

Posted by jesseduffield 12/26/2025

Always bet on text (2014)(graydon2.dreamwidth.org)
347 points | 181 commentspage 5
benatkin 12/27/2025|
I was surprised to see something was in text today, until I remembered knowing it at some point - the .har format. Looking at simonw's Claude-generated script [1] to investigate AI agent sent emails [2] by extracting .har archives, I saw that it uses base64 for binary and JSON strings for text.

It might be a good bet to bet on text, but it feels inefficient a lot of the time, especially in cases like this where all sorts of files are stored in JSON documents.

1: https://gist.github.com/simonw/007c628ceb84d0da0795b57af7b74...

2: https://simonwillison.net/2025/Dec/26/slop-acts-of-kindness/

TimByte 12/27/2025||
Text feels boring only until you notice how much work it quietly does better than everything else... And pragmatically: text respects time and bandwidth
ivanjermakov 12/27/2025||
Another fascinating property of text (as compared to video), it's less temporal-sensitive. It means that it's much easier to skim through and skip sections, kind of like teleporting through time it took to write such text.
citbl 12/27/2025||
The last 2 paragraphs were quite poetic.

PS: 2014

firemelt 12/27/2025||
this, my thesis should be more to be text to text instead image to text
imedadel 12/27/2025||
Always bet on language*
qntmfred 12/27/2025||
there is a surprising number of images used in that post.
doctorleff 12/29/2025||
All data structures can be represented as graphs. I use the term "graph" for a collection of nodes (dots) and edges. (The rest of this paragraph introduces this concept of graph, as per this definition, for those not familiar with it.) Imagine a set of islands connected by bridges; the nodes are the islands; the bridges are the edges. [1] Seven Bridges of Konigsburg, Wikipedia, https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsbe... Graph Theory, https://discrete.openmathbooks.org/dmoi3/ch_graphtheory.html A different kind of example is a conflict graph; the program reads a set of courses, a set of students, and for each student, the courses that student wants to take; the nodes would be courses; every time two or more students want to take the same two courses, the program creates an edge between those two courses. [2] [2] Runa Ganguli and Siddhartha Roy, "A Study on Course Timetabling based on the Graph Coloring Approach" International Journal of Computational and Applied Mathematics, Volume 2, No 17, 469-485.

A computer program would process this graph to schedule the courses so there are no conflicts or few conflicts. In other words, it would try to satisfy as many students as possible. This is in contrast with the term "graph" that one saw in high school or junior high school; that represents a function. An example would be the line chart where the height of a child is on the y-axis with their age on their x-axis and a point representing each time their height was taken and with lines connecting one height data point to another.

All data structures can be represented as graphs. For example, a hypergraph can be represented as a graph where each hyperedge corresponds to a node and connected to the nodes to which the hyperedge. Objects in a mechanical engineering CAD system or graphics display system are often kept as the winged-edge configuration. That is, we know for each edge, the adjacent faces and for each faces, the edges. Thus, the face is a "hyperedge" with the edges in the diagram being the nodes. [3] [3] https://en.wikipedia.org/wiki/Winged_edge Stanford Technical Report STAN CS 320 Bruce G. Baumgart, "Winged Edge Polyhedron Representation" http://i.stanford.edu/pub/cstr/reports/cs/tr/72/320/CS-TR-72... Charles Eastman and Kevin Walter Geometric Modeling Using the Euler Operators , Carnegie Mellon University DRC 15-279, May 1979

Of course, any graph can be serialized. Often, that would be done in JSON or XML. ChatGPT tells me that the time to serialize a graph is O(V+E) for adjacency lists and O(V^2) for adjacency matrices. That is, any data structure represented as a collection of pointers can be converted into text in time linear to the amount of information in the data structure. Adjacency matrices are used when we want to quickly see whether one entity is connected to another; but it is at the cost of space and time to serialize.

Assume one is tracking which students are taking (or are interested in taking) which course. In the computer, the programmer can put this into a rectangular array of size CS where C is the number of courses and S is the number of students. When dumped into text naively into text, this would take space and time writing to disk proportional to CS. On the other hand, assume that this is a sparse array; on average, each student is only interested in taking 10 courses. We can represent this as a list of average size 10 for each student, or time 10S. (Or more precisely, 10S+C.) We call in computer science, the relation between students and courses as a many-many relationship.

See also: [4] https://stackoverflow.com/questions/51783/how-to-serialize-a...

That is the power of sparsity, it reduces the time from a product to a linear function. (The classic graph is a many-many relation of something to itself. That is, which island is connected to another island by a bridge, which course is connected to another one by a student interested in both, or which city is connected to another city by a direct flight.) The average number of connections for each entity to another entity is the sparsity, m. Thus, the time to write the data for a sparse representation is represented by mN where N is the number of entities (or nodes).

By a little verbal sleight of hand, we say that the many-many relationship of students courses is a graph where some nodes are labled "student" and others are represented "course."

Throughout the above discussions, I ignore the constant which is the time to write one connection to the file; in this discussion, I ignore it in most of the discussion for simplicity. Similarly, with space, there is a proportionality constant--how many bits or bytes does it take to record one student-course connection or one bridge in the island-graph example.

As an aside not relevant to my discussion but relevant to the entire discussion, I just saw a news article on storing JSON on binary. https://devclass.com/2024/01/16/sqlites-new-support-for-bina...

casey2 12/27/2025||
I was thinking about this last night before bed. People often counter that data visualization in 2D and 3D are more important and that we need a visual programming language.

I completely disagree, if LLMs have taught us anything it's that the semantic space is MASSIVE and has far too many dimensions to visualize. Of course for some specific situations visualizations are great and can give you almost immediate insight, but for truly complex problems the only ability we have as humans that lets humans understand complex relationships is language

Now language can be visual, textual or auditory. But at the end of the day it must be a language. Music notation isn't a language, it's a very simple set of semantics splayed out in a standard way, when people try to increase the semantic density it turns comical, also there is very little contextual relationship between the semantic markings (key affects notes and ties affect notes but key never affects ties). Whereas a programming language can have entire scores a single identifier. Many people have a shared, somewhat lossy understanding of unreal whether they worked with it, played a game with it or whatever, one that can include a lot more than just the code.

thelastgallon 12/27/2025|
Also, In the Beginning was the Command Line by Neal Stephenson: https://web.stanford.edu/class/cs81n/command.txt

Excerpts where he explains: "Now this was technically a fault in the application (Word 6.0 for the Macintosh) not the operating system (MacOS 7 point something) and so the initial target of my annoyance was the people who were responsible for Word. But. On the other hand, I could have chosen the "save as text" option in Word and saved all of my documents as simple telegrams, and this problem would not have arisen. Instead I had allowed myself to be seduced by all of those flashy formatting options that hadn't even existed until GUIs had come along to make them practicable. I had gotten into the habit of using them to make my documents look pretty (perhaps prettier than they deserved to look; all of the old documents on those floppies turned out to be more or less crap). Now I was paying the price for that self-indulgence. Technology had moved on and found ways to make my documents look even prettier, and the consequence of it was that all old ugly documents had ceased to exist."

and

"When my Powerbook broke my heart, and when Word stopped recognizing my old files, I jumped to Unix. The obvious alternative to MacOS would have been Windows. I didn't really have anything against Microsoft, or Windows. But it was pretty obvious, now, that old PC operating systems were overreaching, and showing the strain, and, perhaps, were best avoided until they had learned to walk and chew gum at the same time.

The changeover took place on a particular day in the summer of 1995. I had been San Francisco for a couple of weeks, using my PowerBook to work on a document. The document was too big to fit onto a single floppy, and so I hadn't made a backup since leaving home. The PowerBook crashed and wiped out the entire file.

It happened just as I was on my way out the door to visit a company called Electric Communities, which in those days was in Los Altos. I took my PowerBook with me. My friends at Electric Communities were Mac users who had all sorts of utility software for unerasing files and recovering from disk crashes, and I was certain I could get most of the file back.

As it turned out, two different Mac crash recovery utilities were unable to find any trace that my file had ever existed. It was completely and systematically wiped out. We went through that hard disk block by block and found disjointed fragments of countless old, discarded, forgotten files, but none of what I wanted. The metaphor shear was especially brutal that day. It was sort of like watching the girl you've been in love with for ten years get killed in a car wreck, and then attending her autopsy, and learning that underneath the clothes and makeup she was just flesh and blood."

More comments...