Posted by obscurette 7/4/2025
Kind of a weird opposite meaning of book-burning.
https://x.com/WallStreetApes/status/1940924371255939236
Our software is like that. A small system will have a crazy number of packages and dependencies. It's not healthy and it's expensive and it's stupid.
Culture tends to drive drunk, swinging wide to extremes and then over-correcting. We're already fully in the wrong lane when we start to have discussions about thinking about the possibility of change.
https://www.youtube.com/watch?v=5ODzO7Lz_pw
It's not a software thing, it's just how humanity works.
The physical world is bound by rules that are unchanging (more-or-less). Above this layer we’ve also devised and agreed upon standards that remain unchanging, though it’s regional: Voltage, screw/bolt sizes, tolerance levels, materials, material measurements, etc.
At this layer, we’ve commoditized and standardized because it’s useful: It makes the components cost-effective and predictable.
In software and computing, I can only think of the low-level networking standard that remain stable. And even that has to be reinvented somewhat for each OS or each nee language.
Everything else seems to be reinvented or rewritten, and then versioned.
Imagine having to upgrade your nuts and bolts in your car to v3.01 or lose support?
Ingredients in the cookies? Yes. 100? No.
Slightly OT: It's interesting how many (smart!) people in tech like the author of this article still can't conceptualize the difference between training objective and learned capability. I wonder at this point if it's a sort of willful ignorance adopted as a psychological protection mechanism. I wonder if they're going to experience a moment of severe shock, just gradually forget that they held these opinions, or take on a sort of delusional belief that AI can't do XYZ despite all mounting evidence to the contrary.
LLMs' initial training is specifically for token-prediction.
However, this doesn't mean that what they end up doing is specifically token-prediction (except in the sense that anything that generates textual output can be described as doing token-prediction). Nor does it mean that the only things they can do are tasks most naturally described in terms of token-prediction.
For instance, suppose you successfully train something to predict the next token given input of the form "[lengthy number] x [lengthy number] = ", where "successfully" means that the system ends up able to predict correctly almost all the time even when the numbers are ones it hasn't seen before. How could it do that? Only by, in some sense, "learning to multiply". (I haven't checked but my hazy recollection is that somewhere around GPT-3.5 or GPT-4 LLMs went from not being able to do this at all to being able to do it fairly well on moderate-sized numbers.)
Or suppose you successfully train something to complete things of the form "The SHA256 hash of [lengthy string] is "; again, a system that could do that correctly would have to have, in some sense, "learned to implement SHA256". (I am pretty sure that today's LLMs cannot do this, though of course they might have learned to call out to a tool that can.)
If you successfully train something to complete things of the form "One grammatical English sentence whose SHA256 hash is [value] is " then that thing has to have "learned to break SHA256". (I am very sure that today's LLMs cannot do this and I think it enormously unlikely that any ever will be able to.)
If you successfully train something to complete things of the form "The complete source code for a program written in idiomatic Rust that does [difficult task] is " then that thing has to have "learned to write code in Rust". (Today's LLMs can kinda do some tasks like this, and there are a lot of people yelling at one another about just how much they can do.)
That is: some token-prediction tasks can only be accomplished by doing things that we would not normally think of as being about token prediction. This is essentially the point of the "Turing test".
For the avoidance of doubt, I am making no particular claims (beyond the illustrative ones explicitly made above) about what if anything today's LLMs, or plausible near-future LLMs, or other further-future AI systems, are able to do that goes beyond what we would normally think of as token prediction. The point is that whether or not today's LLMs are "just stochastic parrots" in some useful sense, it doesn't follow from the fact that they are trained on token-prediction that that's all they are.
My Vectrex still worked last I checked.
But the ~1980s corporation is no longer and it was driven by the hype cycle too, it's just not a recognizable one. You can google the adverts or read Soul of a New Machine.
The methods and algorithms powering advances in modern science, medicine, communications, entertainment, etc. would be impossible to develop, much less run, on something so rudimentary as a TI-99/4A. The applications we harness our technology for have become much more sophisticated, and so too must the technology stacks underpinning them, to the point that no single individual can understand everything. Take something as simple as real time video communication, something we take for granted today. There is no single person in the world who deeply understands every single aspect, from the semiconductor engineering involved in the manufacture of display and image sensors, to the electronics engineering behind the communication to/from the display/sensor, to the signal processing and compression algorithms used to encode the video, to the network protocols used to actually transmit the video, to the operating system kernel's scheduler capable of performing at sufficiently low-latency to run the videochat app.
By analogy, one can understand and construct every component of a mud hut or log cabin, but no single person is capable of understanding, much less constructing, every single component of a modern skyscraper.
He's criticizing the act of _not building_ on previous learnings. _It's in the damn title_.
Repeating mistakes from the past leads to a slow down in such advancements.
This has nothing to do with learning everything by yourself (which, by the way, is a worthy goal and every single person that tries knows by heart that it cannot be done, it's not about doing it).
If it really a were just division of labor, beneficial abstraction, shoulders of giants, etc, shouldn't we be able to distinguish genuinely new concepts from things we already had 40 years ago in a different context?
This is called “good journalism”. It would be great if Elektor tried practicing it.