Posted by ntnbr 7 days ago
A test I did myself was to ask Claude (The LLM from Anthropic) to write working code for entirely novel instruction set architectures (e.g., custom ISAs from the game Turing Complete [5]), which is difficult to reconcile with pure retrieval.
[1] Lovelace, A. (1843). Notes by the Translator, in Scientific Memoirs Vol. 3. ("The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.") Primary source: https://en.wikisource.org/wiki/Scientific_Memoirs/3/Sketch_o.... See also: https://www.historyofdatascience.com/ada-lovelace/ and https://writings.stephenwolfram.com/2015/12/untangling-the-t...
[2] https://academic.oup.com/mind/article/LIX/236/433/986238
[3] https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf
[4] https://web.stanford.edu/class/sts145/Library/life.pdf
[5] https://store.steampowered.com/app/1444480/Turing_Complete/
Unfortunately, its corpus is bound to contain noise/nonsense that follows no formal reasoning system but contributes to the ill advised idea that an AI should sound like a human to be considered intelligent. Therefore it is not a bag of words but a bag of probabilities perhaps. This is important because the fundamental problem is that an LLM is not able, by design, to correctly model the most fundamental precept of human reason, namely the law of non-contradiction. An LLM must, I repeat must assign nonvanishing probability to both sides of a contradiction, and what's worse is the winning side loses, since long chains of reason are modelled with probability the longer the chain, the less likely an LLM is to follow it. Moreover, whenever there is actual debate on an issue such that the corpus is ambiguous the LLM becomes chaotic, necessarily, on that issue.
I literally just had an AI prove the forgoing with some rigor, and in the very next prompt, I asked it to check my logical reasoning for consistency and it claimed it was able to do so (->|<-).
A practically infinite library where both gibberish and truth exist side by side.
The trick is navigating the library correctly. Except in this case you can’t reliably navigate it. And if you happen to stumble upon some “future truth” (i.e. new knowledge), you still need to differentiate it from the gibberish.
So a “crappy” version of the Library of Babel. Very impressive, but the caveats significantly detract from it.
I've been learning more about roses lately and the amount of information on them varies so much because the world roses live in is equally varied. LLMs make for a better search engine but you still need to develop your own internal models, worse yet - if LLMs continue to be refined off of cul-de-sac conclusions then all the wisdom of the journey is lost both to the consumer and the LLM itself.
But the truth is there has been a major semantic shift. Previously LLMs could only solve puzzles whose answers were literally in the training data. It could answer a math puzzle it had seen before, but if you rephrased it only slightly it could no longer answer.
But now, LLMs can solve puzzles where, like, it has seen a certain strategy before. The newest IMO and ICPC problems were only "in the training data" for a very, very abstract definition of training data.
The goal posts will likely have to shift again, because the next target is training LLMs to independently perform longer chunks of economically useful work, interfacing with all the same tools that white-collar employees do. It's all LLM slop til it isn't, same as the IMO or Putnam exam.
And then we'll have people saying that "white collar employment was all in the training data anyway, if you think about it," at which point the metaphor will have become officially useless.
The defenders are right insofar as the (very loose) anthropomorphizing language used around LLMs is justifiable to the extent that human beings also rely on disorder and stochastic processes for creativity. The critics are right insofar as equating these machines to humans is preposterous and mostly relies on significantly diminishing our notion of what "human" means.
Both sides fail to meet the reality that LLMs are their own thing, with their own peculiar behaviors and place in the world. They are not human and they are somewhat more than previous software and the way we engage with it.
However, the defenders are less defensible insofar as their take is mostly used to dissimulate in efforts to make the tech sound more impressive than it actually is. The critics at least have the interests of consumers and their full education in mind—their position is one that properly equips consumers to use these tools with an appropriate amount of caution and scrutiny. The defenders generally want to defend an overreaching use of metaphor to help drive sales.
They are search engines that can remix results.
I like this one because I think most modern folks have a usefully accurate model of what a search engine is in their heads, and also what "remixing" is, which adds up to a better metaphor than "human machine" or whatever.
I would heartily embrace an "AI-to-Bag of Words" browser plugin.
But even more than that, today’s AI chats are far more sophisticated than probabilistically producing the next word. Mixture of experts routes to different models. Agents are able to search the web, write and execute programs, or use other tools. This means they can actively seek out additional context to produce a better answer. They also have heuristics for deciding if an answer is correct or if they should use tools to try to find a better answer.
The article is correct that they aren’t humans and they have a lot of behaviors that are not like humans, but oversimplifying how they work is not helpful.
"The machine accepts Chinese characters as input, carries out each instruction of the program step by step, and then produces Chinese characters as output. The machine does this so perfectly that no one can tell that they are communicating with a machine and not a hidden Chinese speaker.
The questions at issue are these: does the machine actually understand the conversation, or is it just simulating the ability to understand the conversation? Does the machine have a mind in exactly the same sense that people do, or is it just acting as if it had a mind?"
Here's one fun approach (out of 100s) :
What if we answer the Chinese room with the Systems Reply [1]?
Searle countered the systems reply by saying he would internalize the Chinese room.
But at that point it's pretty much exactly the Cartesian theater[2] : with room, homunculus, implement.
But the Cartesian theater is disproven, because we've cut open brains and there's no room in there to fit a popcorn concession.
I think there is some validity to the Cartesian theater, in that the whole of the experience that we perceive with our senses is at best an interpretation of a projection or subset of "reality."