Top
Best
New

Posted by olalonde 3 days ago

Stochastic Parrots: Frequently Unasked Questions(medium.com)
47 points | 45 comments
hellohello2 3 hours ago|
"Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind."

Modelling text describing the world is not modelling (some aspect) of the world?

Modelling the probability that a reader likes or dislike a piece of text is not modelling (some aspect) of a reader's state of mind?

qsera 14 minutes ago||
>Modelling text describing the world is not modelling (some aspect) of the world?

The text describes the world to humans. This is the crucial thing that you miss. It is very subjective.

Imagine that you learn the grammar of a foreign language without learning the meaning of the words. You might be able to make grammatically valid sentences. But you will still will not understand a single thing that something written in that language describes.

tootie 2 hours ago||
No? There's no model involved. It's all just probabilistic. LLMs understand what you're thinking as well as a mood ring.
roenxi 2 hours ago|||
It isn't possible to have "just probabilistic" (maybe a philosophical exception could be made for a uniform random distribution or whatever provides the little dose of randomness required to get nondeterministic results). Probabilities are always in context of a model. LLMs model language but language itself is a model of something else. My money would have been on language modelling nonsense, but that is quite clearly not the case. Turns out it models the world and so do LLMs.
aoeusnth1 2 hours ago||||
The model is the thing which is learned in order to make the probabilistic prediction with low entropy.
hellohello2 2 hours ago||||
The literal definition of a model is "an informative representation of an object, person, or system". I think you mean something else though, what are you trying to express exactly?
afthonos 2 hours ago|||
Nothing about an LLM is “just”. In what precise sense do you mean it is probabilistic?
majormajor 21 minutes ago||
There's a reason stochastic was used in the original phrase instead of "probabilistic."

While most inference executions are intentionally non-deterministic, even a purely deterministic one would still be stochastic in that the model itself was built in a process such that the statistical frequency, sequencing, etc of the training text and followup processes all heavily influence the result.

Because of that, the output is the sort of thing that is not expected to generate 100% perfect output 100% of the time, but to have a good probability of being like-in-kind-to-the-training-data (and useful/relevant as a result).

(As compared to a non-stochastic model, like arithmetic on integers, where 2+2 is always gonna be 4 and you don't have a chance of coming up with some novel pair of inputs to addition that will cause your arithmetic to miss the mark.)

siegecraft 58 minutes ago||
> Most things we historically do with computing are not well approximated by extruding synthetic text.

I don't understand this point. I feel like almost everything associated with computing is extruding synthetic text.

advisedwang 24 minutes ago||
Just to name some of the main things I think of computers doing, especially with a historical lens: analyzing data, processing transactions, simulating dynamics of physical systems, controlling electronic parts of devices, providing entertainment, encoding/decoding audio/video/text. I think these are the kinds of things that Dr Bender is saying are not well suited to textual tools.
majormajor 41 minutes ago||
It seems like a criticism that's actually a hint at a bigger point. The entire appeal/hype is due to the promise of doing things that historically computers have not done well.

That's captured elsewhere - attempts to create "synthetic human behavior" - but mostly around ethics vs practical function or consumer appeal.

Even just a "stochastic parrot" can be extremely valuable if the parrot is fast enough and can connect enough dots in a human-reasoning-style to say things like "what could come after a description of a problem, some background info, and a question about what could have caused the problem? Probably a relevant hypothesis that fits the background facts and the problem description" and then generate a high-probability-fitting sequence of text to spit out.

There doesn't need to be any more intent in that than just "predict what would be the next text that would be similarly connected to the previous in the same way text in the model training process would." It doesn't need to be intending to solve the problem if the hit rate is good enough such that predicting how someone else would describe the solution is often the same as actually "intending" to solve it...

Nor does the ability to predict things stochasticly mean that there isn't any symbolic way to do the same. Quite possibly the stochastic process is just a brute-force rough approximation of what a true symbolic model could do. IMO the success of the stochastic approach is exactly in line with the existence of some sort of underlying structure/system. (Though such as system would have to be incredibly complex to support all the crazy things we do with language.)

libraryofbabel 3 hours ago||
It would have been nice to see some version of “I am very surprised by how far LLMs have come since I wrote the stochastic parrots paper, here is how I have revised my thinking.” But there is nothing like that and the author is just doubling down or trying to correct perceived “misinterpretations” of her work.

Meanwhile you have multiple Fields Medalists (Tau, Gowers) saying they’re very impressed by LLMs’ mathematical reasoning, something that the stochastic parrots thesis (if it has any empirically-predictive content at all) would predict was impossible. I doubt Tau and Gowers thought much of LLMs a few years ago either. But they changed their minds. Who do you want to listen to?

I think it’s time to retire the Stochastic Parrots metaphor. A few years ago a lot of us didn’t think LLMs would ever be capable of doing what they can do now. I certainly didn’t. But new methods of training (RLVR) changed the game and took LLMs far beyond just reducing cross entropy on huge corpuses of text. And so we changed our opinions. Shame Emily Bender hasn’t too.

Sigh.

ageedizzle 31 minutes ago||
It's clear from this comment that you did not read the full article. If you did then you'd have seen that the author addresses this criticism you're making here.
marshray 1 hour ago|||
The Parrots paper:

"Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot."

So perhaps this has always been a negative claim, about what language model AI is not.

majormajor 27 minutes ago|||
> "Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot."

and

> "Meanwhile you have multiple Fields Medalists (Tau, Gowers) saying they’re very impressed by LLMs’ mathematical reasoning, something that the stochastic parrots thesis (if it has any empirically-predictive content at all) would predict was impossible. I doubt Tau and Gowers thought much of LLMs a few years ago either. But they changed their minds. Who do you want to listen to?"

I don't understand how these things are supposedly incompatible.

Larger models and further other refinement reduce the "haphazardness" of produced text. A big enough model with enough semantic connections between different words/phrasings/etc plus enough logical connections of how cause and effect, question and answer, works in human language can obviously stitch together novel sequences when presented with novel prompts. (The output was not limited to sequences of n words that appeared 1:1 in the training data for any n for at least three and a half years now, if not even back to when the paper was written.)

"without any reference to meaning" veers into the philosophical (see how much "intent" is brought up in the linked post today). But has anything been proven wrong about the idea that the text prediction is based on probabilistic evaluation based on a model's training data? E.g. how can you prove "reasoning" vs "stochastic simulated reasoning" here?

Perhaps a useful counterfactual (but hopelessly-expensive/possibly-infeasible) would be to see if you could program a completely irrational LLM. Would such a model be able to "reason" it's way into realizing its entire training model was based on fallacies and intentionally-misleading statements and connections, or would it produce consistent-with-its-training-but-logically-wrong rebuttals to attempts to "teach" it the truth?

libraryofbabel 52 minutes ago|||
Maybe, but a claim about what and LLM is not is still a claim about what it can or cannot do. And specifically:

> without any reference to meaning

is vague, but I read it as actually quite a strong claim about the limitations of LLMs. I don’t think it would be possible for LLMs to do long chains of correct mathematical reasoning about novel problems that they haven’t seen before “without any reference to meaning.” That simply isn’t possible just by regurgitating and remixing random chunks of training data. Therefore I consider the stochastic parrots picture of LLMs to be wrong.

It might have been an accurate picture in 2020. It is not an accurate picture now. What is often missed in these discussions is that LLM training now looks totally different than it did a couple years ago. RLVR completely changed the game, allowing LLMs to actually do math and code well, among other things.

mbauman 2 hours ago|||
> stochastic parrots thesis (if it has any empirically-predictive content at all

Did you read TFA? This is precisely one of the non-questions that she answers.

kalkin 2 hours ago||
Yes, she addresses this by denying that she's made any empirical hypothesis, but in a way that's some combination of disingenuous and confused.

She also says:

> What I am trying to do... is to help people understand what these systems actually are

Can a phrase that has no empirical content aid people in understanding an empirical phenomenon?

> the astonishing willingness of so many to... turn to synthetic text... for all kinds of weighty decisions.

Why is this astonishing, if the nature of these models as "stochastic parrots" places no limitations whatosever on their empirical capabilities, reliability, etc?

> the field of linguistics is particularly relevant in this moment, as a linguist’s eye view on language technology is desperately needed to help make wise decisions about how we do and don’t use these products

Is it wise to make decisions about a product on the basis of information that has no relevance to how it is actually likely to behave?

(It may be, if one has ethical concerns with "data theft, the exploitative labor practices", etc -- but one could have such concerns about any kind of product, not just a "stochastic parrot", and linguists are not generally academia's experts on, e.g., labor practices.)

harpiaharpyja 2 hours ago|||
...did you read TFA?
tootie 2 hours ago|||
She says explicitly it's not an empirical hypothesis. It's just a label for how they function. Which hasn't really changed even as they've gotten more useful. I haven't followed the full drama but this post is her saying the term has been frequently misapplied and she's basically distancing herself from some critiques that were misinterpreting her intent.
libraryofbabel 2 hours ago||
> She says explicitly it's not an empirical hypothesis. It's just a label for how they function.

Then… what’s the point of the label, if it’s not making any empirically-meaningful claims about LLMs at all? I know that LLMs involve sampling over a distribution of output logits. I’ve written code to do it. So what? I know they have statistical elements. Yet I don’t go around calling LLMs stochastic parrots, because that label implies a whole lot of claims about LLMs that I don’t think are true any longer, like that they are just regurgitating and remixing training data and can’t successfully model structured systems (like mathematics or programming).

roenxi 1 hour ago||
It is making an empirically-meaningful claim - it is observing what LLMs do in a neatly pithy way. It isn't a hypothesis though, because it doesn't try to explain anything.

> Yet I don’t go around calling LLMs stochastic parrots, because that label implies a whole lot of claims about LLMs that I don’t think are true any longer, like that they are just regurgitating and remixing training data and can’t successfully model structured systems.

The first part doesn't imply the second. It is nearly unarguable that all LLMs are going is regurgitating and remixing training data. There aren't any significant inputs other inputs than training data. It seems more likely that humans are doing the same operation the LLMs are when they model structured systems or exercise creativity - compressing data in efficient ways and then spitting it back out. "Humans are stochastic parrots" is an easy claim to defend.

gessha 1 hour ago|||
The appeal to authority is strong here. A tool stochastic parrot can be useful too.
seatsh 2 hours ago||
Gowers, Tao and Lichtman are especially impressed by the funding of math.inc and the AI for Math Fund, a joint venture of Renaissance Philanthropies and XTX Markets.

Renaissance Philanthropies is a front for VC companies.

They never publish allocated computational resources, prior art or any novel algorithm that is used in the LLMs. For all we know, all accounts that are known to work on math stunts get 20% of total compute.

In other words, they ignore prior art, do not investigate and just celebrate if they get a vibe math result. It isn't science, it is a disgrace.

newtonsmethod 2 hours ago||
Is your justification in dismissing Fields medalists that they are impressed by funding? Not even receiving it (I assume you say this because Tao is not funded by AI for Math, but rather an advisor for it)?

Not only would it be a leap to suggest that people automatically lose their integrity by taking funds for projects they believe are useful, especially after involvement with adjacent fields, but you are suggesting merely being impressed by a fund is enough to dismiss their views?

You also have no evidence that Renaissance Philanthropies is a front for VC companies. All news coverage indicates that they seek to be an alternative for high net worth individuals engaging in philanthropy.

Many people discovering Erdos results, engaging in Olympiads etc, are doing so with publicly available models and publish the resources used in the process.

sdf127 2 hours ago||
Renaissance "Philanthropy" brainwashes children with AI, which is child abuse:

https://www.renaissancephilanthropy.org/insights/renaissance...

https://www.renaissancephilanthropy.org/insights/embedding-a...

It promotes "agentic science", which will destroy science further:

https://www.renaissancephilanthropy.org/insights/open-source...

No one publishes. Please show me papers about the math proof logic in ChatGPT that are as detailed as those from Boyer/Moore/Kaufman for prior work.

If they are on arxiv.org with 50 authors in a sea of slop, I didn't find them. If they exist, they are certainly not from Gowers, Tao or Lichtman.

You have all the upper hand because your AI shills back you up here, but nothing of substance.

newtonsmethod 2 hours ago||
This is getting insane. You have no evidence for your initial claims and didn't respond to a thing I said, and are now claiming using AI for education is "child abuse". Please get help.
ashgt 1 hour ago||
There is also no evidence that Radio Free Europe is still linked to the CIA. Just look at the donors of Renaissance Misanthropy.

But we are feeding a sealion who does not know how the math proof logic in LLMs work, probably because it is a highly computationally expensive random restart hack calling Lean that is unpublishable.

newtonsmethod 1 hour ago||
Many of these results don't rely on repeatedly calling Lean. You have no clue what you're talking about.

> Just look at the donors of Renaissance Misanthropy. If you're actually interested, who funds each project is listed in the PDF here. https://www.renaissancephilanthropy.org/annual-reports

As you can see, it's mainly philanthropic projects of wealthy families.

NooneAtAll3 55 minutes ago||
> Another common trope in the discourse around this phrase is to claim that stochastic parrot is an insult (or even a slur). On one reading, that would require LLMs to be the kind of thing that can take or feel offense, which they clearly aren’t.

isn't that circular reasoning?

"I can call anyone not smart enough to take offense because as I said those anyone aren't smart enough to take offense"?

(also disregarding that being offended has been shifted into "protection of the (perceived) weak (or of the group of your allegiance)" rather than "protection of self" for quite some time now)

---

but generally I always felt that this tension around the phrase was somewhat of perscriptive/descriptive difference, or maybe "level of detail in the model" type

just because there is knowledge of a more full understanding of the process doesn't mean other descriptions/modeling of the process are invalid or unuseful

newtonian gravity doesn't describe time dilation - and yet most of the time it is enough to use only it, so it's successfully studied in schools and undergrads

if output of LLM can be modeled (by intuition) as "some other being" for many practical uses *and model works* - then automatical blaming others for "using less precise model" and warning about it feels... strange

getnormality 46 minutes ago||
I think "stochastic parrot" misses the mark as a characterization of LLMs, but so does "artificial intelligence." They're both somewhat helpful and somewhat misleading in complementary ways.

Maybe that's the best one can do when describing something very new and strange. A series of vivid, incompatible metaphors might be the best guide for a while. "Intelligence" as we normally understand it is a significant overstatement, while "parrot" is a massive understatement.

leonidasv 3 hours ago||
What a hill to die on.
tibbar 1 hour ago||
I mean, we're pretty deep into Westworld/Blade Runner-style scifi at this point. It's actually a crazy, mind-bending question to try to grasp what is going on with chatclaudini at this point. Regardless of what labels we choose or properties we choose to affirm, we're far too deep into uncanny valley for it to be very helpful.
_wire_ 3 days ago||
Lovely article well worth attention by virtue of its regard for the cultural traits of terminology and its inflections, while also debunking the pervasive lore that "AI" devices are doing anything but the merest resemblance of thinking.

It's rare to read an author who can directly face Brandolini's Law of misinformation asymmetry and not only hold his own against the bullshit but overcome it.

skybrian 14 minutes ago||
You're dismissing LLM-generated text as the "merest resemblance of thinking" when the way it resembles thinking is becoming increasingly useful.

When I prompt a coding agent to fix a bug, it outputs text describing a hypothesis and more text that results in running shell commands to test the hypothesis. If the output shows that it guessed wrong, it outputs more text to test a different hypothesis, and more text to edit code, and in the end, the bug is fixed.

The text resembles the output of a reasoning process closely enough to actually work. Maybe, for some purposes, it doesn't matter if it's "real" or not?

What does "real" reasoning do for us that the imitation doesn't do? Does it come up with better hypotheses? Is it better at testing them? Sometimes, but not always. Human reasoning is more expensive, less available, and sometimes gets poor results.

CamperBob2 3 hours ago||
TIL that the "merest resemblance of thinking" is enough to take gold at IMO.
radkZ 3 hours ago|||
Automated theorem provers are not new, in fact they are very old. One of the most automated is ACL2, which uses the well studied waterfall method (unrelated to waterfall development).

LLMs certainly use something similar, except they understand text as input. LLMs, especially used for marketing stunts, have way more computing power available than any theorem prover ever had. They probably do random restarts if a proof fails which amounts to partially brute forcing.

Lawrence Paulson correctly complained about some of the hype that Lean/LLMs are getting.

ACL2 even uses formulaic text output that describes the proof in human language, despite being all in Common Lisp and not a mythical clanker.

They do not think and use old and well established algorithms or perhaps novel ones that were added.

CamperBob2 25 minutes ago|||
LLMs certainly use something similar

They certainly do not. Read the papers where the IMO results were presented. No tools of any kind were used.

nsingh2 2 hours ago|||
Proof search isn't new, but I don't think that captures the value of LLMs.

They act as a learned proposal mechanism on top of hard search. Things like suggesting relevant lemmas, tactics, turning intent into formal steps, and ranking branches based on trained knowledge.

Maybe a kind of learned "intuition engine", from a large corpus of mathematical text, that still has to pass a formal checker. This is not really something we've had to this extent before.

> They do not think

That claim seems less useful, unless “think” is defined in a way that predicts some difference in capability. If the objection is that LLMs are not conscious, fine, but that doesn't say much about whether they can help produce correct formal proofs.

scotty79 3 hours ago|||
And also create novel math proofs.
tom_ 2 hours ago||
Perhaps actual thinking is not automatically necessary for that either! - and the LLM is proof.
throw310822 1 hour ago||
Then what is thinking necessary for? Not for proving novel results; not for coding; not for writing prose; not for arguing a point; not for interpreting artworks; etc.
rafram 1 hour ago||
> not for writing prose; not for arguing a point; not for interpreting artworks

To be fair, LLMs are pretty bad at all of these. They struggle to avoid cliches and to produce prose with actual substance (below a stylistic facade that is undeniably convincing).

aaronbrethorst 1 hour ago||
They struggle to avoid cliches and to produce prose with actual substance

I have bad news for you about the writings of most Ph.D.s and University professors...

gyanchawdhary 3 hours ago|
[dead]