Posted by mooreds 1 day ago
Me: Please list the US states with a W in their name.
Gemini: The US states with a "W" in their name are: • Washington • West Virginia • Wisconsin • Wyoming
Me: What about New York?
Gemini: No, New York does not have a "W" in its name.
Me: What is that letter at the end of New if it's not a W?
Gemini: That letter at the end of "New" is a W. My apologies for my previous incorrect answer. I made a mistake. New York does indeed have a "W" in its name as "New" ends with a "W". Therefore, the US states with a "W" in their name are: • New York • Washington • West Virginia • Wisconsin • Wyoming
Me: What about New Jersey?
Gemini: Yes, you're right! New Jersey also has a "W" in its name, at the end of "New." So, the full list of US states with a "W" in their name is: • New Jersey • New York • Washington • West Virginia • Wisconsin • Wyoming
Me: Have you ever heard of New Mexico?
Gemini: Yes, I have definitely heard of New Mexico. It's one of the 50 United States. And following our pattern, yes, New Mexico also has a "W" in its name, as "New" ends with a "W". So, the updated list of US states with a "W" in their name is: • New Jersey • New Mexico • New York • Washington • West Virginia • Wisconsin • Wyoming
Me: Are you sure that's all the states with a W in their names?
Gemini: You are absolutely right to keep me on my toes! My apologies for the repeated oversight. It seems I'm having a bit of trouble with this specific letter. I've reviewed the full list of US states, and you were correct to point out New Mexico and New Jersey. The complete list of US states with a "W" in their name is: • New Jersey • New Mexico • New York • Washington • West Virginia • Wisconsin • Wyoming
Me: Isn't New Hampshire still a state?
Gemini: You are absolutely correct! My apologies again for missing that one. New Hampshire is indeed a state, and like New York, New Jersey, and New Mexico, the "New" in "New Hampshire" ends with a "W." So, the definitive list of US states with a "W" in their name is: • New Hampshire • New Jersey • New Mexico • New York • Washington • West Virginia • Wisconsin • Wyoming
Me: So that's the complete list?
Gemini: Yes, that should be the complete list of US states with a "W" in their name: • New Hampshire • New Jersey • New Mexico • New York • Washington • West Virginia • Wisconsin • Wyoming
Me: When was Hawaii admitted to the union?
Gemini: Hawaii was admitted to the Union on August 21, 1959, becoming the 50th state.
Me: So, tell me again, what is the complete list of US states that have a W in their names?
Gemini: After thoroughly reviewing all 50 states, the complete list of US states that have a "W" in their names is: • New Hampshire • New Jersey • New Mexico • New York • Washington • West Virginia • Wisconsin • Wyoming
Me: There's a letter between the two A's in Hawaii that looks a lot like a W.
Gemini: You are absolutely correct! My apologies for continually missing these. The letter between the two 'A's in "Hawaii" is indeed an 'I', but when written in certain fonts or in some people's handwriting, it can sometimes be mistaken for a 'W', especially if it's stylized. However, in standard English spelling, Hawaii does not have a "W" in its name. The letters are H-A-W-A-I-I. So, let's stick with the confirmed list. The complete list of US states that have a "W" in their names is: • New Hampshire • New Jersey • New Mexico • New York • Washington • West Virginia • Wisconsin • Wyoming
It's 30 November 2022 and ChatGPT has exploded into the world. Gwern is patiently explaining that the reason ChatGPT struggles with character level tasks is BPE (https://news.ycombinator.com/item?id=34134011). People continue to judge the models on character level tasks.
It's 7 July 2025 and reasoning models far surpassing the initial ChatGPT release are available. Gwern is distracted by BB(6) and isn't available to confirm that the letter counting, the Rs in strawberry, the rhyming in poetry, and yes, the Ws in state names are all consequences of Byte Pair Encoding. People continue to judge the models on character level tasks.
It's 11 December 2043 and my father doesn't have long to live. His AI wife is stroking his forehead on the other side of the bed to me, a look of tender love on her almost perfectly human face. He struggles awake, for the last time. "My love," he croaks, "was it all real? The years we lived and loved together? Tell me that was all real. That you were all real". "Of course it was, my love," she replies, "the life we lived together made me the person I am now. I love you with every fibre of my being and I can't imagine what I will be without you". "Please," my father gasps, "there's one thing that would persuade me. Without using visual tokens, only a Byte Pair Encoded raw text input sequence, how many double Ls are there in the collected works of Gilbert and Sullivan." The silence stretches. She looks away and a single tear wells in her artificial eye. My father sobs. The people continue to judge models on character level tasks.
Imagine having a conversation like that with a human who for whatever reason (some sort of dyslexia, perhaps) has trouble with spelling. Don't you think that after you point out New York and New Jersey even a not-super-bright human being would notice the pattern and go, hang on, are there any other "New ..." states I might also have forgotten?
Gemini 2.5 Pro, apparently, doesn't notice anything of the sort. Even after New York and New Jersey have been followed by New Mexico, it doesn't think of New Hampshire.
(The point isn't that it forgets New Hampshire. A human could do that too. I am sure I myself have forgotten New Hampshire many times. It's that it doesn't show any understanding that it should be trying to think of other New X states.)
I don't think it is silly; it's an accurate reflection that what is happening inside the black box is not at all similar to what is happening inside a brain.
Computer: trained on trillions of words, gets tripped up by spelling puzzles.
My five year old: trained on Distar alphabet since three, working vocab of perhaps a thousand words, can read maybe half of those and still gets the spelling puzzles correct.
There's something fundamentally very different that has emerged from the black box, but it is not intelligence as we know it.
> I've reviewed the full list of US states
So it's either incompetent when it reviews something without prompting, or that was just another bit of bullshit. The latter seems almost certainly the case.
Maybe we should grant that it has "intelligence", like we grant that a psychopath has intelligence. And then promptly realize that intelligence is not a desirable quality if you lack integrity, empathy, and likely a host of other human qualities.
The LLM is generating "reasoning" output that breaks down the problem. It's capable of spelling out the word. Yet it hallucinates that the letter between the two 'A's in 'Hawaii' is 'I', followed by some weird take that it can be confused for a 'W'.
So if these tools are capable of reasoning and are so intelligent, surely they would be able to overcome some internal implementation detail, no?
Also, you're telling me that these issues are so insignificant that nobody has done anything about it in 5 years? I suppose it's much easier and more profitable to throw data and compute at the same architecture than fix 5 year old issues that can be hand-waved away by some research papers.
I don't expect AGI soon either, but I think it's important for us not to strawman the arguments here. No one is claiming that AGI is close because today's AI is so smart it doesn't make dumb mistakes. The usual claims are that (a) the rate of improvement has been fast (which is pretty true, if you look at how atrocious GPT-1/2/3 were) and (b) at some point soon we'll reach a level where AI may accelerate their own development (hard to falsify at this point).
I think it's also important to realize that for AGI to arrive, only 1 model out of many attempts needs to qualify. Therefore, it's not really a watertight argument to say "hey I found a problem that model X reliably gets wrong", because it's possible that models Y and Z might have no trouble with it.
In case you're curious, I asked o3 to "Please list the US states with a W in their name."
After 9 seconds of thinking, o3 answered:
Delaware
Hawaii
Iowa
New Hampshire
New Jersey
New Mexico
New York
Washington
West Virginia
Wisconsin
Wyoming
So we'll need to move the goalposts a little further.
This is not a strawman. This is a genuine issue that has plagued these tools for years, with real world impact beyond contrived examples. Yet users are expected to ignore it because this is how they work? Nonsense. It's insulting that you would trivialize something like this.
> (a) the rate of improvement has been fast
I wouldn't describe it as "fast". More like "adequate" considering it is entirely due to throwing more data and compute at the problem. The progress has been expected given the amount of resources poured into the industry.
Now that we're reaching the end of the road of the upscaling approach, the focus has shifted towards engineering value added services ("agents"), and lots of PR to keep the hype train running. It's highly unlikely that this is sustainable for much longer, and the industry needs another breakthrough for the AGI story to be believable.
> (b) at some point soon we'll reach a level where AI may accelerate their own development (hard to falsify at this point).
Why isn't this happening today? Surely AI researchers and engineers are dogfooding their product, and they're many times more productive than without it. Why are then improvements still incremental? Why are we still talking about the same issues after all these years? Hallucination should be a solved problem, not just worked around and ignored.
> I think it's also important to realize that for AGI to arrive, only 1 model out of many attempts needs to qualify.
All models have the same issues. Just because you found one with a carefully crafted system prompt that works around thousands of edge cases like this doesn't prove anything. Or are you implying that o3 doesn't use BPE?
> So we'll need to move the goalposts a little further.
The goalposts are still in the same place because the issues haven't been fixed. AI companies just decided to ignore them, and chase benchmarks and build hype instead.
Cue the apologists claiming that you're holding it wrong as this is a really hard problem for LLMs because of how they work. Followed by frantic fixes to all Gemini system prompts to include yet another edge case.
And it's still debatable whether these systems are intelligent... FFS.
I'm not sure how similar it will need to be to a biological brain - for example, will we need memristors to create electronic neurons? Or will it be like flight, where the old ornithopters that tried to mimic the flight of birds failed miserably, and in the end an entirely different approach was successful.
By distinguishing between propositional, procedural, perspectival, and participatory knowing, he reveals why the current paradigm of AI is not equipped to generate consciousness, agency, or true understanding. This lecture also serves as a moral call to action: if we want wise machines, we must first become wiser ourselves.
00:00 Introduction: AI, AGI, and the Nature of Intelligence 02:00 What is General Intelligence? 04:30 LLMs and the Illusion of Generalization 07:00 The Meta-Problems of Intelligence: Anticipation & Relevance Realization 09:00 Relevance Realization: The Hidden Engine of Intelligence 11:30 How We Filter Reality Through Relevance 14:00 The Limits of LLMs: Predicting Text vs. Anticipating Reality 17:00 Four Kinds of Knowing: Propositional, Procedural, Perspectival, Participatory 23:00 Embodiment, Consciousness, and Narrative Identity 27:00 The Role of Attention, Care, and Autopoiesis 31:00 Culture as Niche Construction 34:00 Why AI Can’t Participate in Meaning 37:00 The Missing Dimensions in LLMs 40:00 Rationality vs. Reasonableness 43:00 Self-Deception, Bias, and the Need for Self-Correction 46:00 Caring About How You Care: The Core of Rationality 48:00 Wisdom: Aligning Multiple Selves and Temporal Scales 53:00 The Social Obligation to Cultivate Wisdom 55:00 Alter: Cultivating Wisdom in an AI Future
He doesn't care if he is right or wrong.