Top
Best
New

Posted by pegasus 10/26/2025

A definition of AGI(arxiv.org)
305 points | 514 commentspage 2
jsheard 10/26/2025|
We'll know AGI has arrived when AGI researchers manage to go five minutes without publishing hallucinated citations.

https://x.com/m2saxon/status/1979349387391439198

artninja1988 10/26/2025||
Came from the Google Docs to BibTeX conversion apparently

https://x.com/m2saxon/status/1979636202295980299

nativeit 10/26/2025|||
I’m gonna start referring to my own lies as “hallucinations”. I like the implication that I’m not lying, but rather speaking truthfully, sincerely, and confidently about things that never happened and/or don’t exist. Seems paradoxical, but this is what we’re effectively suggesting with “hallucinations”. LLMs necessarily lack things like imagination, or an ego that’s concerned with the appearance of being informed and factually correct, or awareness for how a lack of truth and honesty may affect users and society. In my (not-terribly-informed) opinion, I’d assert that precludes LLMs from even approximate levels of intelligence. They’re either quasi-intelligent entities who routinely lie to us, or they are complex machines that identify patterns and reconstruct plausible-sounding blocks of text without any awareness of abstract concepts like “truth”.

Edit: toned down the preachiness.

bonoboTP 10/26/2025|||
This looks like a knee-jerk reaction to the title instead of anything substantial.
MichaelZuo 10/26/2025|||
It does seem a bit ridiculous…
CamperBob2 10/26/2025||
So infallibility is one of the necessary criteria for AGI? It does seem like a valid question to raise.

Edit due to rate-limiting, which in turn appears to be due to the inexplicable downvoting of my question: since you (JumpCrisscross) are imputing a human-like motivation to the model, it sounds like you're on the side of those who argue that AGI has already been achieved?

JumpCrisscross 10/26/2025||
> infallibility

Lying != fallibility.

cjbarber 10/26/2025||
Some AGI definition variables I see:

Is it about jobs/tasks, or cognitive capabilities? The majority of the AI-valley seems to focus on the former, TFA focuses on the latter.

Can it do tasks, or jobs? Jobs are bundles of tasks. AI might be able to do 90% of tasks for a given job, but not the whole job.

If tasks, what counts as a task: Is it only specific things with clear success criteria? That's easier.

Is scaffolding allowed: Does it need to be able to do the tasks/jobs without scaffolding and human-written few-shot prompts?

Today's tasks/jobs only, or does it include future ones too? As tasks and jobs get automated, jobs evolve and get re-defined. So, being able to do the future jobs too is much harder.

Remote only, or in-person too: In-person too is a much higher bar.

What threshold of tasks/jobs: "most" is apparently typically understood to mean 80-95% (Mira Ariel). Automating 80% of tasks is different to 90% and 95% and 99%. diminishing returns. And how are the tasks counted - by frequency, by dollar-weighted, by unique count of tasks?

Only economically valuable tasks/jobs, or does it include anything a human can do?

A high-order bit on many people's AGI timelines is which definition of AGI they're using, so clarifying the definition is nice.

AstroBen 10/26/2025|
Not only tasks, but you need to look at the net effect

If it does an hour of tasks, but creates an additional hour of work for the worker...

vayup 10/27/2025||
Precisely defining what "Intelligence" is will get us 95% of the way in defining "Artificial General Intelligence". I don't think we are there yet.
vardump 10/26/2025||
Whatever the definition may be, the goalposts are usually moved once AI reaches that point.
kelseyfrog 10/26/2025||
There's at least two distinct basis in AGI refutations : behaviorist and ontological. They often get muddled.

I can't begin to count the number of times I've encountered someone who holds an ontological belief for why AGI cannot exist and then for some reason formulates it as a behavioralist criteria. This muddying of argument results in what looks like a moving of the goalposts. I'd encourage folks to be more clear whether they believe AGI is ontologically possible or impossible in addition to any behavioralist claims.

lo_zamoyski 10/27/2025||
> I can't begin to count the number of times I've encountered someone who holds an ontological belief for why AGI cannot exist and then for some reason formulates it as a behavioralist criteria.

Unclear to me what you mean. I would certainly reject an ontological possibility of intelligent computers, where computation is defined by the Church-Turing thesis. It's not rocket science, but something difficult for some people to see without a sound and basic grasp of metaphysics and the foundations of CS. Magical thinking and superstition comes more easily then. (I've already given an explanation of this in other posts ad nauseam. In a number of cases, people get argumentative out of ignorance and misunderstanding.)

However, I don't reject out of hand the possibility of computers doing a pretty good job of simulating the appearance of intelligence. There's no robust reason to think that passing the Turing test implies intelligence. A good scarecrow looks human enough to many birds, but that doesn't mean it is human.

But the Turing test is not an especially rigorous test anyway. It appeals to the discernment of the observer, which is variable, and then there's the question of how much conversation or behavior, and in what range of circumstances, you need before you can make the call. Even in some unrealistic and idealized thought experiment, if a conversation with an AI were completely indiscernible with perfect discernment from a conversation with a human being, it would nonetheless lack a causal account of what was observed. You would have only shown a perfect correlation, at best.

zahlman 10/26/2025|||
My experience has been more that the pro-AI people misunderstand where the goalposts were, and then complain when they're correctly pointed at.

The "Turing test" I always saw described in literature, and the examples of what passing output from a machine was imagined to look like, are nothing like what's claimed to pass nowadays. Honestly, a lot of the people claiming that contemporary chatbots pass come across like they would have thought ELIZA passed.

bonoboTP 10/26/2025||
Can you be more concrete? What kind of answer/conversation do you see as demonstrating passing the test, that you think is currently not possible.
tsimionescu 10/26/2025||
Ones in which both the human test takers and the human counterparts are actively trying to prove to each other that they are actually human.

With today's chat bots, it's absolutely trivial to tell that you're not talking to a real human. They will never interrupt you, continue their train of thought even thought you're trying to change the conversation, go on a complete non-sequitur, swear at you, etc. These are all things that the human "controls" should be doing to prove to the judges that they are indeed human.

LLMs are nowhere near beating the Turing test. They may fool some humans in some limited interactions, especially if the output is curated by a human. But left alone to interact with the raw output for more than a few lines, and if actively seeking to tell if you're interacting with a human or an AI (instead of wanting to believe), there really is no chance you'd be tricked.

bonoboTP 10/26/2025||
Okay but we are not really optimizing them to emulate humans right now. In fact, it's the opposite. The mainstream bots are explicitly trained to not identify as humans and to refuse to claim having thought or internal feelings or consciousness.

So in that sense it's a triviality. You can ask ChatGPT whether it's human and it will say no upfront. And it has various guardrails in place against too much "roleplay", so you can't just instruct it to act human. You'd need a different post-training setup.

I'm not aware whether anyone did that with open models already.

tsimionescu 10/26/2025|||
Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful. If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform. In contrast, a human can both be human, and be good at their job - this is the standard by which we should judge these machines. If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
famouswaffles 10/26/2025||
>Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful.

Post training them to speak like a bot and deny being human has no effect on how useful they are. That's just an Open AI/Google/Anthropic preference.

>If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform

Raw models are not worse. Literally every model release paper that compares both show them as better at benchmarks, if anything. Post training degrading performance is a well known phenomena. What they are is more difficult to guide/control. Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.

It's besides the point anyways because again, you don't have to post train them to act as anything other than a human.

>If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.

Okay, but that's not the case.

tsimionescu 10/26/2025||
> Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.

This is exactly what I was referring to.

famouswaffles 10/26/2025||
You are talking about instruction tuning. You can perform instruction tuning without making your models go out of the way to tell you they are not human, and it changes literally nothing about their usefulness. Their behavior does not have to be restricted this way to get them useful/instruction tuned. So your premise is wrong.
zahlman 10/26/2025|||
> Okay but we are not really optimizing them to emulate humans right now.

But that is exactly the point of the Turing test.

bonoboTP 10/26/2025||
Ok, but then it doesn't make sense to dismiss AI based on that. It fails the Turing test, because it's creators intentionally don't even try to make something that is good at the (strictly defined) Turing test.

If someone really wants to see a Turing-passing bot, I guess someone could try making one but I'm doubtful it would be of much use.

Anyways,people forget that the thought experiment by Turing was a rhetorical device, not something he envisioned to build. The point was to say that semantic debates about "intelligence" are distractions.

krige 10/26/2025|||
Are you saying that we already have AGI, except those pesky goalpost movers keep denying the truth? Hm.
NitpickLawyer 10/26/2025|||
I'd say yes, by at least one old definition made by someone who was at the time in a position to have a definition.

When deepmind was founded (2010) their definition was the following: AI is a system that learns to perform one thing; AGI is a system that learns to perform many things at the same time.

I would say that whatever we have today, "as a system" matches that definition. In other words, the "system" that is say gpt5/gemini3/etc has learned to "do" (while do is debateable) a lot of tasks (read/write/play chess/code/etc) "at the same time". And from a "pure" ML point, it learned those things from the "simple" core objective of next token prediction (+ enhancements later, RL, etc). That is pretty cool.

So I can see that as an argument for "yes".

But, even the person who had that definition has "moved the goalposts" of his own definition. From recent interviews, Hassabis has moved towards a definition that resembles the one from this paper linked here. So there's that. We are all moving the goalposts.

And it's not a recent thing. People did this back in the 80s. There's the famous "As soon as AI does something, it ceases to be AI" or paraphrased "AI is everything that hasn't been done yet".

bossyTeacher 10/26/2025|||
> AGI is a system that learns to perform many things at the same time.

What counts as a "thing"? Because arguably some of the deep ANNs pre-transfomers would also qualify as AGI but no one would consider them intelligent (not in the human or animal sense of intelligence).

And you probably don't even need fancy neural networks. Get a RL algorithm and a properly mapped solution space and it will learn to do whatever you want as long as the problem can be mapped.

wahnfrieden 10/26/2025||||
Can you cite the Deepmind definition? No Google results for that.
NitpickLawyer 10/26/2025||
It's from a documentary that tracks Hassabis' life. I c/p from an old comment of mine (the quotes are from the documentary, can probably look up timestamps if you need, but it's in the first ~15 minutes I'd say, when they cover the first days of Deepmind):

----

In 2010, one of the first "presentations" given at Deepmind by Hassabis, had a few slides on AGI (from the movie/documentary "The Thinking Game"):

Quote from Shane Legg: "Our mission was to build an AGI - an artificial general intelligence, and so that means that we need a system which is general - it doesn't learn to do one specific thing. That's really key part of human intelligence, learn to do many many things".

Quote from Hassabis: "So, what is our mission? We summarise it as <Build the world's first general learning machine>. So we always stress the word general and learning here the key things."

And the key slide (that I think cements the difference between what AGI stood for then, vs. now):

AI - one task vs. AGI - many tasks

at human level intelligence.

darepublic 10/26/2025|||
It doesn't play chess? Just can parrot it very well
NitpickLawyer 10/26/2025||
Yeah, maybe. But what matters is the end result. In the kaggle match, one of the games from the finals (grok vs o3) is rated by chesscom's stockfish as 1900vs2500. That is, they played a game at around those ratings.

For reference, the average chesscom player is ~900 elo, while the average FIDE rated player is ~1600. So, yeah. Parrot or not, the LLMs can make moves above the average player. Whatever that means.

darepublic 10/26/2025||
I believe it will make illegal moves (unaided by any tools ofc). It will also make mistakes doing things like not being able to construct the board correctly given a fen string. For these reasons I consider long strings of correct moves insufficient to say it can play the game. If my first two statements, about a propensity for illegal moves and other fails on "easy for humans" tasks were untrue then I would reconsider.
NitpickLawyer 10/26/2025||
In the kaggle test they considered the match forfeit if the model could not produce a legal move after 3 tries (none of the matches in the finals were forfeited, they all ended with checkmate on the board). Again, chesscom's interface won't let you make illegal moves, and the average there is 900. Take that as you will.
vardump 10/26/2025||||
No, just what has usually happened in the past with AI goalposts.

At first, just playing chess was considered to be a sign of intelligence. Of course, that was wrong, but not obvious at all in 1950.

krige 10/27/2025||
You know, as the saying goes, if a metric becomes a target...
empath75 10/26/2025||||
I don't think AGI is a useful concept, but if it exists at all, there's a very good argument that LLMs had it as soon as they could pass the Turing test reliably, which they accomplished years ago at this point.
root_axis 10/27/2025||
LLMs do not pass the turing test. It's very easy to know if you're speaking with one.
A4ET8a8uTh0_v2 10/26/2025||||
I think, given some of the signs of the horizon, there is a level of MAD type bluffing going around, but some of the actions by various power centers suggest it is either close, people think its close or it is there.
derektank 10/26/2025|||
It wasn't the best definition of AGI but I think if you asked an interested layman whether or not a system that can pass the Turing test was AGI 5 years ago, they would have said yes
jltsiren 10/26/2025|||
An interested but uninformed layman.

When I was in college ~25 years ago, I took a class on the philosophy of AI. People had come up with a lot of weird ideas about AI, but there was one almost universal conclusion: that the Turing test is not a good test for intelligence.

The least weird objection was that the premise of the Turing test is unscientific. It sees "this system is intelligent" as a logical statement and seeks to prove or disprove it in an abstract model. But if you perform an experiment to determine if a real-world system is intelligent, the right conclusion for the system passing the test is that the system may be intelligent, but a different experiment might show that it's not.

nativeit 10/26/2025|||
Douglas Hofstadter wrote Gödel, Escher, Bach nearly 50-years ago, and it won a Pulitzer Prize, the National Book Award, and got featured in the popular press. It’s been on lots of college reading lists, from 2007 online coursework for high school students was available from MIT. The FBI concluded that the 2001 anthrax scare was in-part inspired by elements of the book, which was found in the attacker’s trash. Anyone who’s wanted to engage with the theories and philosophy surrounding artificial intelligence has had plenty of materials that get fairly in-depth asking and exploring these same questions. It seems like a lot of people seem to think this is all bleeding edge novelty (at least, the underlying philosophical and academic ideas getting discussed in popular media), but rather all of the industry is predicated on ideas that are very old philosophy + decades-old established technology + relatively recent neuroscience + modern financial engineering. That said, I don’t want to suggest a layperson is likely to have engaged with any of it, so I understand why this will be the first time a lot of people will have ever considered some of these questions. I imagine what I’m feeling is fairly common to anyone who’s got a very niche interest that blows up and becomes the topic of interest for the entire world. I think there’s probably some very interesting, as-yet undocumented phenomena occurring that’s been the product of the unbelievably vast amount of resources sunk into what’s otherwise a fairly niche kind of utility (in LLMs specifically, and machine learning more broadly). I’m optimistic that there will be some very transformational technologies to come from it, although whether it will produce anything like “AGI”, or ever justify these levels of investment? Both seem rather unlikely.
MattRix 10/26/2025|||
Isn’t that the point of trying to define it in a more rigorous way, like this paper is doing?
bigyabai 10/26/2025|||
The authors acknowledge that this is entirely possible. Their work is just grounded in theory, after all:

> we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.

righthand 10/26/2025|||
I agree if our comprehension of intelligence and “life” is incomplete, so is our model for artificial intelligence.
rafram 10/26/2025||
Are you claiming that LLMs have achieved AGI?
moffkalast 10/26/2025||
Compared to everything that came before they are fairly general alright.
empath75 10/26/2025||
This is a serious paper by serious people and it is worth reading, but any definition of intelligence that depends on human beings as reference will never be a good basis for evaluating non human intelligence.

You could easily write the reverse of this paper that questions whether human beings have general intelligence by listing all the things that LLMs can do, which human beings can't -- for example producing a reasonably accurate summary of a paper in a few seconds or speaking hundreds of different languages with reasonable fluency.

You can always cherry pick stuff that humans are capable that LLMs are not capable of and vice versa, and and I don't think there is any reason to privilege certain capabilities over others.

I personally do not believe that "General Intelligence" exists as a quantifiable feature of reality, whether in humans or machines. It's phlogiston, it's the luminiferous ether. It's a dead metaphor.

I think what is more interesting is focusing on _specific capabilities_ that are lacking and how to solve each of them. I don't think it's at all _cheating_ to supplement LLM's with tool use, RAG, the ability to run python code. If intelligence can be said to exist at all, it is as part of a system, and even human intelligence is not entirely located in the brain, but is distributed throughout the body. Even a lot of what people generally think of as intelligence -- the ability to reason and solve logic and math problems typically requires people to _write stuff down_ -- ie, use external tools and work through a process mechanically.

mitthrowaway2 10/26/2025||
Quite the list of authors. If they all personally approved the text, it's an achievement in itself just to get all of them to agree on a definition.
mrsvanwinkle 10/26/2025|
indeed, i am wondering if these hn comments actually have an idea and they rub shoulders with these names with their dismissive confidence.
optimalsolver 10/26/2025||
Maybe one of these exalted names should've proof-read the paper:

https://x.com/m2saxon/status/1979349387391439198

Der_Einzige 10/26/2025||
Most people who say "AGI" really mean either "ASI" or "Recursive Self Improvement".

AGI was already here the day ChatGPT released: That's Peter Norvig's take too: https://www.noemamag.com/artificial-general-intelligence-is-...

mitthrowaway2 10/26/2025||
The reason some people treat these as equivalent is that AI algorithm research is one of the things a well-educated adult human can do, so an AGI who commits to that task should be able to improve itself, and if it makes a substantial improvement, then it would become or be replaced by an ASI.

To some people this is self-evident so the terms are equivalent, but it does require some extra assumptions: that the AI would spend time developing AI, that human intelligence isn't already the maximum reachable limit, and that the AGI really is an AGI capable of novel research beyond parroting from its training set.

I think those assumptions are pretty easy to grant, but to some people they're obviously true and to others they're obviously false. So depending on your views on those, AGI and ASI will or will not mean the same thing.

photonthug 10/27/2025||
Funny but the eyebrow-raising phrase 'recursive self-improvement' is mentioned in TFA in an example about "style adherence" that's completely unrelated to the concept. Pretty clearly a scam where authors are trying to hack searches.

Prerequisite for recursive self-improvement and far short of ASI, any conception of AGI really really needs to be expanded to include some kind of self-model. This is conspicuously missing from TFA. Related basic questions are: What's in the training set? What's the confidence on any given answer? How much of the network is actually required for answering any given question?

Partly this stuff is just hard and mechanistic interpretability as a field is still trying to get traction in many ways, but also, the whole thing is kind of fundamentally not aligned with corporate / commercial interests. Still, anything that you might want to call intelligent has a working self-model with some access to information about internal status. Things that are mentioned in TFA (like working memory) might be involved and necessary, but don't really seem sufficient

tsoukase 10/27/2025||
My takes as a neuroscientist:

1) defining intelligence is very difficult, almost impossible. Much more the artificial one

2) there are many types of human intelligence. Verbal is one of them and the closest to comparing with LLMs

3) machines (not only LLMs but all, like robots) excel where humans are bad and vice versa due to their different background, without exception. Comparing the two is totally meaningless and unfair for both. Let's have both complement the other.

4) AGI remains a valid target but we are still very far from it, like in other ones, as control the DNA and treat arbitrary genetic diseases, solve the earth resource problem and harness other planets, create a near perfect sociopolitical system with no inequality. Another Singularity is added in the list

5) I am impressed by how far a PC cluster has come up through "shuffling tokens" but on the other side I am pessimistic of how further it can reach having finate input/training data.

throwanem 10/26/2025|
How, summing (not averaging) to 58 of 1000 possible points (0-100 in each of ten domains), are we calling this score 58% rather than 5.8%?
NitpickLawyer 10/26/2025||
It's confusing. The 10 tracks each get 10%. So they add up all the percentages from every track. When you see the first table, 10% on math means "perfect" math basically. Not 10% of math track.
alexwebb2 10/26/2025||
0-10 in each domain. It’s a weird table.
jagrsw 10/27/2025||
The simple additive scoring here is sus here. It means a model that's perfect on 9/10 axes but scores 0% on Speed (i.e., takes effectively infinite time to produce a result) would be considered "90% AGI".

By this logic, a vast parallel search running on Commodore 64s that produces an answer after BeaverNumber(100) years would be almost AGI, which doesn't pass the sniff test.

A more meaningful metric would be more multiplicative in nature.

More comments...