Posted by jesseduffield 15 hours ago
Bret Victor's point is why is this not also the approach we use for other topics, like engineering? There are many people who do not have a strong symbolic intuition, and so being able to tap into their (and our) other intuitions is a very powerful tool to increase efficiency of communication. More and more, I have found myself in this alternate philosophy of education and knowledge transmission. There are certainly limits—and text isn't going anywhere, but I think there's still a lot more to discover and try.
[1] https://dynamicland.org/2014/The_Humane_Representation_of_Th...
Bret Victor's work involves a ton of really challenging heavy lifting. You walk away from a Bret Victor presentation inspired, but also intimidated by the work put in, and the work required to do anything similar. When you separate his ideas from the work he puts in to perfect the implementation and presentation, the ideas by themselves don't seem to do much.
Which doesn't mean they're bad ideas, but it might mean that anybody hoping to get the most out of them should understand the investment that is required to bring them to fruition, and people with less to invest should stick with other approaches.
Amen to that. Even dynamic land has some major issues with GC pauses and performance issues.
I do try to put my money where my mouth is, so I've been contributing a lot to folk computer[1], but yeah, there's still a ton of open questions, and it's not as easy as he sometimes makes it look.
Yes, but musical notation is far superior to text for conveying the information needed to play a song.
Mostly this is straightforwardly correct. Notes on a staff are a textual representation of music.
There are some features of musical notation that aren't usually part of linguistic writing:
- Musical notation is always done in tabular form - things that happen at the same time are vertically aligned. This is not unknown in writing, though it requires an unusual context.
- Relatedly, sometimes musical notation does the equivalent of modifying the value of a global variable - a new key signature or a dynamic notation ("pianissimo") takes effect everywhere and remains in effect until something else displaces it. In writing, I guess quotation marks have similar behavior.
- Musical notation sometimes relates two things that may be arbitrarily far apart from each other. (Consider a slur.) This is difficult to do in a 1-D stream of symbols.
> although, one can argue that musical notation is not able to adequately preserve some aspects of musical performance
Nothing new there; that's equally true of writing in relation to speech.
It renders the term "text" effectively meaningless.
I think it's naïve to claim there's a singular best method to communicate. Text is great, especially since it is asynchronous. But even the OP works off of bad assumptions that are made about verbal language being natural and not being taught. But there's a simple fact, when near another person we strongly prefer to speak than write. And when we can mix modes we like to. There's an art to all this and I think wanting to have a singular mode is more a desire of simplicity than a desire to be optimal
No, graphs do not need come from text. I've frequently hand generated graphs as my means of recording experimental output. This is a common method when high precision is not needed (because your uncertainty level is the size of your markers). But that's true for graphs in general anyways.
Importantly, graphs are better at conveying the relationship between data, rather than information about a single point. (something something - Poincaré ;)
Besides, plots aren't the only types of graphs. Try network graphs.
Besides, graphs aren't the only visual communication of data.
I'll give you an even more obvious one: CAD. Sure, you can do that in text... but it takes much more room to do and magnitudes more time to interpret. So much so that everyone is going to retranslate it into a picture. Hell, I'll draw on paper before even pulling up the software and that's not uncommon.
Fascinating example for me. I do CAD... using text! My only experience with it is programmatic in openscad. We check the visualization, but only on output of the final product. For me it's dramatically easier to work with. That may be a personal defect but it's also consistent. Underneath the rendering is always data, which is text, markup, but strings of fundamental data.
And in science it's not a stretch at all that numbers come first. I'll argue you're reaching. Today no one is drawing their numbers from experiments directly on a graph. They record them digitally. In textual form typically, and then render them visually to obtain generic understanding. But also there, in the end, your conclusions (per tradition) need to be point estimates with error bounds expressible in concise textual terms. You may obtain them from looking at images but the hard truth is numerical, digital, textual.
Can you tell me more about the pipeline? Are you really starting from scratch by programming? You don't do any sketching first? I'm really having a hard time imagining doing anything reasonably complicated with this method. I'll admit that there are some advantages like guaranteeing bounds but there's so much that seems actually harder to do that way.
> They record them digitally
Like I said, it is contextually dependent. If you're recording with digital equipment to a computer, then yeah, it's just easier to record that way and dump into a plot. But if you don't have that then no. And again, even recording by hand it is still dependent.But some data is naturally image data (pictures?). Some data is naturally in other modalities (chemical reactions? Smell? Texture? Taste?). Yes, with digital recording equipment you can argue that this is all text but at that point I'd argue you're being facetious as everything is text by that definition.
> You may obtain them from looking at images but the hard truth is numerical, digital, textual.
Here I think you have a fundamental misunderstanding and are likely limiting yourself based on your experience.First off, not every measuring device is digital. So just that alone makes it down right false. And pretending all measurements are digital is just deceptive or naive.
Second, and I cannot stress this enough: *every single measurement is a proxy* to the thing you intend to measure.
You can't even measure a damn meter directly. You can measure distance through reference length that is an approximation of a standard distance (aka a ruler). You can measure distance through reference to an approximation of time and through the use of some known velocity, such as the speed of light through a given medium (approximating time, approximating c in the medium, approximating the medium). And so on.
What you cannot do is measure a meter directly.
And most of the things we're trying to measure, model, and approximate in modern science are far more abstract than a standard unit!
The idea that the ground truth is textual is ridiculous. That would only be true on the condition that the universe itself is running on a digital computer. Despite the universe being able to do computation, I see little reason to believe it is digital.
Part of this might be OpenSCAD specifically. It is CSG based, which is really not ideal, making it hard to add things like chamfers and fillets to your model. Most OpenSCAD models I come across for 3D printing have a crude look probably because this is so hard.
But part of it is just that text for most people just isn't the right representation in this case. (If you look at the relative usage of parametric CAD to textual CAD on sites for 3D models you will see that I'm right. Also, look at what approach commercial packages offer.)
Of course, not all graphs are equally information dense, and some are only used for decorative purposes more than actually conveying information. But in the general case, and especially when used well, graphs convey much more information at a glance than a short text description could.
Physical intuition is an enormous part of our intelligence, and is hard to convey in text: you could read millions of words about how to ride a bike, and you would learn nothing compared to spending a few hours trying it out and falling over until it clicks.
What separates text from images is that text is symbolic while images are visceral or feelings based. In the same way, text comes in short when it comes to the feeling you get when seeing an image. Try to put in to text what you feel when you look at Norman Rockwell's Freedom of Speech or a crappy 0.5MB picture of your daughter taken on an iPhone 3. Hard isn't it? Visual and symbolic are not isomorphic systems.
Examples of symbolic systems like text are sheet music and Feynman diagrams. You would be hard pressed if you tried to convey even 2KB of sheet music in a book
Text is certainly not the best at all things and I especially get the idea that in pedagogy you might want other things in a feedback loop. The strength of text however is its versatility, especially in an age where text transformers are going through a renaissance. I think 90%+ of the time you want to default to text, use text as your source of truth, and then other mediums can be brought into play (perhaps as things you transform your text into) as the circumstances warrant.
> video's inferior to text for communicating ideas efficiently
Depends on the topic tbh. For example, YouTube has had an absolute explosion of car repair videos, precisely because video format works so well for visual operations. But yes, text is currently the best way to skim/revisit material. That's one reason I find Bret's website so intriguing, since he tries to introduce those navigation affordances into a video medium.
> The strength of text however is its versatility, especially in an age where text transformers are going through a renaissance. I think 90%+ of the time you want to default to text, use text as your source of truth, and then other mediums can be brought into play (perhaps as things you transform your text into) as the circumstances warrant.
Agree, though not because of text's intrinsic ability, but because its ecosystem stretches thousands of years. It's certainly the most pragmatic choice of 2025. But, I want to see just how far other mediums can go, and I think there's a lot of untapped potential!
I'd compare it's message to a "warning !" sign. It's there to make you stop and think about our computing space, after that it's up to you to act or not on how you perceive it.
That's totally wishy-washy, so it might not resonate, but after that I went to check more of what dynamicland is doing and sure enough they're doing things that are completely outside of the usual paradigm.
A more recent video explaining the concept in a more practical and down to earth framing: https://youtu.be/PixPSNRDNMU
(here again, reading the transcript won't nearly convey the point. Highly recommend watching it, even sped up if needed)
You can store everything as a string; base64 for binary, JSON for data, HTML for layout, CSS for styling, SQL for queries... Nothing gets closer to the mythical silver-bullet that developers have been chasing since the birth of the industry.
The holy grail of programming has been staring us in the face for decades and yet we still keep inventing new data structures and complex tools to transfer data... All to save like 30% bandwidth; an advantage which is almost fully cancelled out anyway after you GZIP the base64 string which most HTTP servers do automatically anyway.
Same story with ProtoBuf. All this complexity is added to make everything binary. For what goal? Did anyone ever ask this question? To save 20% bandwidth, which, again is an advantage lost after GZIP... For the negligible added CPU cost of deserialization, you completely lose human readability.
In this industry, there are tools and abstractions which are not given the respect they deserve and the humble string is definitely one of them.
You could turn that around & say that, for the negligible human cost of using a tool to read the messages, your entire system becomes slower.
After all, as soon as you gzip your JSON, it ceases to be human-readable. Now you have to un-gzip it first. Piping a message through a command to read it is not actually such a big deal.
You know the rule, "pick 2 out of 3". For a CPU, converting "123" would be a pain in the arse if it had one. Oh, and hexadecimal is even worse BTW; octal is the most favorable case (among "common" bases).
Flexibility is a bit of a problem too - I think people generally walked back from Postel's law [1], and text-only protocols are big "customers" of it because of its extreme variability. When you end-up using regexps to filter inputs, your solution became a problem [2] [3]
30% more bandwidth is absolutely huge. I think it is representative of certain developers who have been spoiled with grotesquely overpowered machines and have no idea any idea of the value of bytes, bauds and CPU cycles. HTTP3 switched to binary for even less than that.
The argument that you can make up for text's increased size by compressing base64 is erroneous; one saves bandwidth and processing power on both sides if you can do away without compression. Also, with compressed base64 you've already lost the readability on the wire (or out of the wire since comms are usually encrypted anyway).
[1] https://en.wikipedia.org/wiki/Robustness_principle
[2] https://blog.codinghorror.com/regular-expressions-now-you-ha...
AFAIKT, binary format of a protobuf message is strictly to provide a strong forward/backward compatibility guarantee. If it's not for that, the text proto format and even the jaon format are both versatile, and commonly used as configuration language (i.e. when humans need to interact with the file).
My old 1995 MS thesis was written in Lotus Word Pro and the last I looked, there was nothing to read it. (I could try Wine, perhaps. Or I could quickly OCR it from paper.) Anyway, I wish it were plain text!
For example, when you gzip a Base64-encoded picture, you end up 1. encoding it in base64 (takes a *lot* of CPU) and then, compressing it (again! jpeg is already compressed).
I think what it boils down to is scale; if you are running a small shop and performance is not critical, sure, do everything in HTTP/1.1 if that makes you more productive. But when numbers start mattering, designing binary protocols from scratch can save a lot of $ in my experience.
For example, I've seen a lot of companies obsess over minor stuff like shaving a few bucks off their JSON serialization or using a C binding of some library to squeeze every drop of efficiency out of those technologies... While at the same time letting their software maintenance costs blow out of control... Or paying astronomical cloud compute bills when they could have self-hosted for 1/20th of the price...
Also, the word scale is overused. What is discussed here is performance optimization, not scalability. Scalability doesn't care for fixed overhead costs. Scalability is about growth in costs as usage increases and there is no difference in scalability if you use ProtoBuf or JSON.
The expression that comes to mind is "Penny-wise, pound-foolish." This effect is absolutely out of control in this industry.
Many large scale systems are on the same camp as you as their text files flow around their batch processors like crazy, but there's absolutely no flexibility or transparency.
Json and or base64 are more targeted as either low volume or high latency systems. Once you hit a scale where optimizing a few bits straight saves a significant amount of money, self labeled fields are just out of question.
You can still stream the base64 separately and reference it inside the JSON somehow like an attachment. The base64 string is much more versatile.
Using base64 means that you must encode and decode it, but binary data directly means that is unnecessary. (This is true whether or not it is compressed (and/or encrypted); if it is compressed then you must decompress it, but that is independent of whether or not you must decode base64.)
There's nothing special about "text" or binary here. You can absolutely put binary inside other binary; you use a symbol that doesn't appear inside the binary, much like you do for text.
You use a divider, like " is for json, and a prearranged way to avoid that symbol from appearing inside the inner binary (the same approach that works for text works here).
What do you think a zip file is? They're not storing compressed binary data as text, I can tell you that.
I think the obsession with text comes down to two factors: conflating binary data with closed standards and poor tooling support. Text implies a baseline level of acceptable mediocrity for both. Consider a CSV file will millions of base64 encoded columns and no column labels. That would really not be any friendlier than a binary file with a openly documented format and suitable editing tool, e.g. sqlite.
Maybe a lack of fundamental technical skills is another culprit, but binary files really aren't that scary.
Text is human readable writing (not necessarily ASCII). It is most certainly not just any old bytes the way you are saying.
It makes more sense to consider readability or comprehensibility of data in an output format; text makes sense for many kinds of data, but given a graph, I'd rather view it as a graph than as a readable text version.
And if you have a way to losslessly transform data between an efficient binary form, readable text, or some kind of image (or other format), that's the best of all.
I suppose open standards have slowly been winning with opus and AV1, but there's still so many forms of interactions that have proprietary or custom interfaces. It seems like anything that has a stable standard has to be at least 20 years old, lol.
Text is like a complexity funnel (analogous to a tokenizer) that everyone shares. Its utility is derived from its compression and its standardization.
If everyone used binary data with their own custom interpretation schema, it might work better for that narrow vertical, but it would not have the same utility for LLMs.
Indeed, there is a galactic civilization centered around binary communication: https://memory-alpha.fandom.com/wiki/Bynar
The inverse is also difficult. Pick a random 15 second movie clip, how to describe it using text without losing much of its essence? Or can one really port a random game into a text version? Can a pilot fly a plane with text-based instrument panel?
Text is not a superset of all communication media. They are just different.
Minor nit: complex language (i.e. Zipf’s law) is the oldest and most stable communication technology.
Before text, we had oral story telling. It allowed us to communicate one generation’s knowledge to the next, and so on.
Arguably this is present elsewhere in the animal kingdom (orcas, elephants, etc.), but human language proves to be the most complex.
Side note: one of my favorite examples is from the Gunditjmara (a group of Aboriginal Australians) who recall a volcanic eruption from 30k+ years ago [0].
Written language (i.e. text) is unique, in that it allows information to pass across multiple generations, without a man-in-the-middle telephone-like game of storytelling.
But both are similar, text requires you to read, in your own voice, the thoughts of another. Storytelling requires you to hear a story, and then communicate it to others.
In either case, the person is required to retell the knowledge, either as an internal monologue or as an external broadcast.
Always bet on language.
I'm a linguist, and I've worked in endangered languages and in minority languages (many of which will some day become endangered, in the sense of not having native speakers). The advantage of plain text (Unicode) formats for documenting such languages (as opposed to binary formats like Word used to be, or databases, or even PDFs) is that text formats are the only thing that will stanmd the test of time. The article by Steven Bird and Gary Simons "Seven Dimensions of Portability for Language Documentation and Description" was the seminal paper on this topic, published in 2002. I've given later conference talks on the topic, pointing out that we can still read grammars of Greek and Latin (and Sanskrit) written thousands of years ago. And while the group I led published our grammars in paper form via PDF, we wrote and archived them as XML documents, which (along with JSON) are probably as reproducible a structured format as you can get. I'm hoping that 2000 years from now, someone will find these documents both readable and valuable.
There is of course no replacement for some binary format when it comes to audio.
(By "binary" format I mean file formats that are not sequential and readily interpretable, whereas text files are interpretable once you know the encoding.)
You rightly mention Unicode, as before that there was a jungle of formats. I have some in UTF-16, some in SJIS, a ton in EUC, other were already utf-8, many don't have a BOM. I could try each encoding and see what works for each of the files (except on mobile...it's just a PITA to deal with that on mobile).
But in comparison there's a set of file I never had issues opening now and then: PDFs and jpegs. All the files that my scanner produced are still readable absolutely everywhere. Even with slight bitrot they're readable, and with the current OCR processes I could probably put it all back in text if ever needed.
If I had to archive more stuff now and can afford the space, I'd go for an image format without hesitation.
PS: I'm surprised you don't mention the Unicode character limitations for minority languages or academic use. There will still be characters that either can't be represented, or don't have an exact 1 to 1 match between the code point and the representation.
Similarly, cave paintings express the painting someone intended to make better than a textual description of it.
Our image models got good when we started making shared image and text embedding spaces. A picture is worth 1000 words, but 1000 words about millions of images are what allowed us to teach computers to see.
Is doing dozens of back and forth to explain what we actually want, while the model burns down inordinate amount of processing power at each turn, a model of efficiency or effectiveness ?
It might be convenient and allow for exploration, the cost might be worth it in some cases, but I wouldn't call it "effective".
2021 (570 points, 339 comments) https://news.ycombinator.com/item?id=26164001
2015 (156 points, 69 comments) https://news.ycombinator.com/item?id=10284202
2014 (355 points, 196 comments) https://news.ycombinator.com/item?id=8451271
And what comes to original article, there is no "text [systems]" (or there is, like there are "number [systems]", just made up). "Text" like this very thing you are reading is 2D drawing. There are no character glyphs of any kind (latin, logograms etc.) defined by universe*, they are human invented and stored/interpreted at human collective level. Computers don't know anything about text, only "numbers" of some bit width, and with those numbers a system must be created that can map some number representation to some drawing in some method (e.g. with bitmap). Also there is a lot of difference between formal/executable and natural human languages. Anyways, it's not a about some text format/encoding, it's the human/computer defined/interpreted non-linguistical meaning behind it (Wittgenstein).
* DNA/RNA can be one such "universal character glyph/string", as the "textual" information is physically constructed and interpreted.
Anything below 3 is considered "partially illiterate".
I've been thinking about this a lot recently, as someone who cares about technical communication and making technical topics accessible to more people.
Maybe wannabe educators like myself should spend more time making content for TikTok or YouTube!
Technical topics demand a technical treatment, not 30-second junk food bites of video infotainment that then imbue the ignorant audiences with the semblance or false feeling of understanding, when they actually possess none. This is why we have so many fucking idiots dilating everywhere on topics they haven't a clue on - they probably saw a fucking YouTube video and now consider themselves in possession of a graduate degree in the subject.
Rather than try to widely distribute and disseminate knowledge, it would be far more prescient to capitalize on what will soon be a massive information asymmetry and widening intellectual inequality between the reads and the read-nots, accelerated by the production of machine generated, misinformative slop at scale.
A "dumb" example would be IKEA manuals that describe an assembly algorithm, I could imagine a lot of other situations where you want to convey very specific and technical information in a form that doesn't rely on a specific language (especially if languages aren't shared).
Color coding, shape standards etc. also go in that direction. The efficiency is just so big.