Top
Best
New

Posted by pegasus 2 days ago

A definition of AGI(arxiv.org)
304 points | 505 commentspage 2
xnx 2 days ago|
I like François Chollet definition of AGI as a system that can efficiently acquire new skills outside its training data.
killthebuddha 2 days ago||
I really appreciate his iconoclasty right now, but every time I engage with his ideas I come away feeling short changed. I’m always like “there is no such thing as outside the training data”. What’s inside and what’s outside the training data is at least as ill-defined as “what is AGI”.
moffkalast 2 days ago|||
So... AGI is a few shot performance metric?
zulban 2 days ago||
Not bad. Maybe.

But maybe that's ASI. Whereas I consider chatgpt 3 to be "baby AGI". That's why it became so popular so fast.

JumpCrisscross 2 days ago||
> I consider chatgpt 3 to be "baby AGI". That's why it became so popular so fast

ChatGPT became popular because it was easy to use and amusing. (LLM UX until then had been crappy.)

Not sure AGI aspirations had anything to do with uptake.

zulban 2 days ago||
ChatGPT 3 was the first AI that could do 100,000 different things poorly. Before that we only had AIs that could do a few things decently, or very well. So yeah, I'm sticking with "baby AGI" because of the "G".
JumpCrisscross 2 days ago|||
> I'm sticking with "baby AGI" because of the "G"

I don't have an opinion on whether ChatGPT qualifies as AGI. What I'm saying is where one stands on that question has nothing to do with "why it became so popular so fast."

(Also, several machine-learning techniques could do millions of things terribly before LLMs. GPT does them, and other things, less poorly. It's a broadening. But I suppose really any intelligence of any kind can be considered a "baby" AGI.)

ben_w 1 day ago|||
Do you mean ChatGPT-3.5, or GPT-3?

The "ChatGPT" web app started with the underlying model GPT-3.5

The predecessor models, a whole series of them collectively "GPT-3" but sold under API with names like "davinci" and "ada", was barely noticed outside AI research circles.

3 was useful, but you had to treat it as a text completion system not a chat interface, your prompt would have been e.g.

  Press release
  Subject: President announces imminent asteroid impact, evacuation of Florida
  My fellow Americans,
Because if you didn't put "My fellow Americans," in there, it would then suggest a bunch of other press release subjects.
SalmoShalazar 2 days ago||
I find the nature of AGI discussion to be so narrow and tedious. Intelligence is incomprehensibly more than being able to generate text that looks convincingly like a human wrote it. The coordination of a physical body, the formation of novel thoughts, the translation of thoughts to action, understanding the consequences of those actions, and so on. There’s so much missing that is required to even approach a literal human infant’s “intelligence” that it feels like I’m going crazy entertaining people’s arguments that we are approaching “AGI”.
jsheard 2 days ago||
We'll know AGI has arrived when AGI researchers manage to go five minutes without publishing hallucinated citations.

https://x.com/m2saxon/status/1979349387391439198

artninja1988 2 days ago||
Came from the Google Docs to BibTeX conversion apparently

https://x.com/m2saxon/status/1979636202295980299

nativeit 2 days ago|||
I’m gonna start referring to my own lies as “hallucinations”. I like the implication that I’m not lying, but rather speaking truthfully, sincerely, and confidently about things that never happened and/or don’t exist. Seems paradoxical, but this is what we’re effectively suggesting with “hallucinations”. LLMs necessarily lack things like imagination, or an ego that’s concerned with the appearance of being informed and factually correct, or awareness for how a lack of truth and honesty may affect users and society. In my (not-terribly-informed) opinion, I’d assert that precludes LLMs from even approximate levels of intelligence. They’re either quasi-intelligent entities who routinely lie to us, or they are complex machines that identify patterns and reconstruct plausible-sounding blocks of text without any awareness of abstract concepts like “truth”.

Edit: toned down the preachiness.

bonoboTP 2 days ago|||
This looks like a knee-jerk reaction to the title instead of anything substantial.
MichaelZuo 2 days ago|||
It does seem a bit ridiculous…
CamperBob2 2 days ago||
So infallibility is one of the necessary criteria for AGI? It does seem like a valid question to raise.

Edit due to rate-limiting, which in turn appears to be due to the inexplicable downvoting of my question: since you (JumpCrisscross) are imputing a human-like motivation to the model, it sounds like you're on the side of those who argue that AGI has already been achieved?

JumpCrisscross 2 days ago||
> infallibility

Lying != fallibility.

cjbarber 2 days ago||
Some AGI definition variables I see:

Is it about jobs/tasks, or cognitive capabilities? The majority of the AI-valley seems to focus on the former, TFA focuses on the latter.

Can it do tasks, or jobs? Jobs are bundles of tasks. AI might be able to do 90% of tasks for a given job, but not the whole job.

If tasks, what counts as a task: Is it only specific things with clear success criteria? That's easier.

Is scaffolding allowed: Does it need to be able to do the tasks/jobs without scaffolding and human-written few-shot prompts?

Today's tasks/jobs only, or does it include future ones too? As tasks and jobs get automated, jobs evolve and get re-defined. So, being able to do the future jobs too is much harder.

Remote only, or in-person too: In-person too is a much higher bar.

What threshold of tasks/jobs: "most" is apparently typically understood to mean 80-95% (Mira Ariel). Automating 80% of tasks is different to 90% and 95% and 99%. diminishing returns. And how are the tasks counted - by frequency, by dollar-weighted, by unique count of tasks?

Only economically valuable tasks/jobs, or does it include anything a human can do?

A high-order bit on many people's AGI timelines is which definition of AGI they're using, so clarifying the definition is nice.

AstroBen 2 days ago|
Not only tasks, but you need to look at the net effect

If it does an hour of tasks, but creates an additional hour of work for the worker...

vayup 2 days ago||
Precisely defining what "Intelligence" is will get us 95% of the way in defining "Artificial General Intelligence". I don't think we are there yet.
tsoukase 1 day ago||
My takes as a neuroscientist:

1) defining intelligence is very difficult, almost impossible. Much more the artificial one

2) there are many types of human intelligence. Verbal is one of them and the closest to comparing with LLMs

3) machines (not only LLMs but all, like robots) excel where humans are bad and vice versa due to their different background, without exception. Comparing the two is totally meaningless and unfair for both. Let's have both complement the other.

4) AGI remains a valid target but we are still very far from it, like in other ones, as control the DNA and treat arbitrary genetic diseases, solve the earth resource problem and harness other planets, create a near perfect sociopolitical system with no inequality. Another Singularity is added in the list

5) I am impressed by how far a PC cluster has come up through "shuffling tokens" but on the other side I am pessimistic of how further it can reach having finate input/training data.

vardump 2 days ago||
Whatever the definition may be, the goalposts are usually moved once AI reaches that point.
kelseyfrog 2 days ago||
There's at least two distinct basis in AGI refutations : behaviorist and ontological. They often get muddled.

I can't begin to count the number of times I've encountered someone who holds an ontological belief for why AGI cannot exist and then for some reason formulates it as a behavioralist criteria. This muddying of argument results in what looks like a moving of the goalposts. I'd encourage folks to be more clear whether they believe AGI is ontologically possible or impossible in addition to any behavioralist claims.

lo_zamoyski 1 day ago||
> I can't begin to count the number of times I've encountered someone who holds an ontological belief for why AGI cannot exist and then for some reason formulates it as a behavioralist criteria.

Unclear to me what you mean. I would certainly reject an ontological possibility of intelligent computers, where computation is defined by the Church-Turing thesis. It's not rocket science, but something difficult for some people to see without a sound and basic grasp of metaphysics and the foundations of CS. Magical thinking and superstition comes more easily then. (I've already given an explanation of this in other posts ad nauseam. In a number of cases, people get argumentative out of ignorance and misunderstanding.)

However, I don't reject out of hand the possibility of computers doing a pretty good job of simulating the appearance of intelligence. There's no robust reason to think that passing the Turing test implies intelligence. A good scarecrow looks human enough to many birds, but that doesn't mean it is human.

But the Turing test is not an especially rigorous test anyway. It appeals to the discernment of the observer, which is variable, and then there's the question of how much conversation or behavior, and in what range of circumstances, you need before you can make the call. Even in some unrealistic and idealized thought experiment, if a conversation with an AI were completely indiscernible with perfect discernment from a conversation with a human being, it would nonetheless lack a causal account of what was observed. You would have only shown a perfect correlation, at best.

zahlman 2 days ago|||
My experience has been more that the pro-AI people misunderstand where the goalposts were, and then complain when they're correctly pointed at.

The "Turing test" I always saw described in literature, and the examples of what passing output from a machine was imagined to look like, are nothing like what's claimed to pass nowadays. Honestly, a lot of the people claiming that contemporary chatbots pass come across like they would have thought ELIZA passed.

bonoboTP 2 days ago||
Can you be more concrete? What kind of answer/conversation do you see as demonstrating passing the test, that you think is currently not possible.
tsimionescu 2 days ago||
Ones in which both the human test takers and the human counterparts are actively trying to prove to each other that they are actually human.

With today's chat bots, it's absolutely trivial to tell that you're not talking to a real human. They will never interrupt you, continue their train of thought even thought you're trying to change the conversation, go on a complete non-sequitur, swear at you, etc. These are all things that the human "controls" should be doing to prove to the judges that they are indeed human.

LLMs are nowhere near beating the Turing test. They may fool some humans in some limited interactions, especially if the output is curated by a human. But left alone to interact with the raw output for more than a few lines, and if actively seeking to tell if you're interacting with a human or an AI (instead of wanting to believe), there really is no chance you'd be tricked.

bonoboTP 2 days ago||
Okay but we are not really optimizing them to emulate humans right now. In fact, it's the opposite. The mainstream bots are explicitly trained to not identify as humans and to refuse to claim having thought or internal feelings or consciousness.

So in that sense it's a triviality. You can ask ChatGPT whether it's human and it will say no upfront. And it has various guardrails in place against too much "roleplay", so you can't just instruct it to act human. You'd need a different post-training setup.

I'm not aware whether anyone did that with open models already.

tsimionescu 2 days ago|||
Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful. If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform. In contrast, a human can both be human, and be good at their job - this is the standard by which we should judge these machines. If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
og_kalu 2 days ago||
>Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful.

Post training them to speak like a bot and deny being human has no effect on how useful they are. That's just an Open AI/Google/Anthropic preference.

>If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform

Raw models are not worse. Literally every model release paper that compares both show them as better at benchmarks, if anything. Post training degrading performance is a well known phenomena. What they are is more difficult to guide/control. Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.

It's besides the point anyways because again, you don't have to post train them to act as anything other than a human.

>If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.

Okay, but that's not the case.

tsimionescu 2 days ago||
> Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.

This is exactly what I was referring to.

og_kalu 2 days ago||
You are talking about instruction tuning. You can perform instruction tuning without making your models go out of the way to tell you they are not human, and it changes literally nothing about their usefulness. Their behavior does not have to be restricted this way to get them useful/instruction tuned. So your premise is wrong.
zahlman 2 days ago|||
> Okay but we are not really optimizing them to emulate humans right now.

But that is exactly the point of the Turing test.

bonoboTP 2 days ago||
Ok, but then it doesn't make sense to dismiss AI based on that. It fails the Turing test, because it's creators intentionally don't even try to make something that is good at the (strictly defined) Turing test.

If someone really wants to see a Turing-passing bot, I guess someone could try making one but I'm doubtful it would be of much use.

Anyways,people forget that the thought experiment by Turing was a rhetorical device, not something he envisioned to build. The point was to say that semantic debates about "intelligence" are distractions.

krige 2 days ago|||
Are you saying that we already have AGI, except those pesky goalpost movers keep denying the truth? Hm.
NitpickLawyer 2 days ago|||
I'd say yes, by at least one old definition made by someone who was at the time in a position to have a definition.

When deepmind was founded (2010) their definition was the following: AI is a system that learns to perform one thing; AGI is a system that learns to perform many things at the same time.

I would say that whatever we have today, "as a system" matches that definition. In other words, the "system" that is say gpt5/gemini3/etc has learned to "do" (while do is debateable) a lot of tasks (read/write/play chess/code/etc) "at the same time". And from a "pure" ML point, it learned those things from the "simple" core objective of next token prediction (+ enhancements later, RL, etc). That is pretty cool.

So I can see that as an argument for "yes".

But, even the person who had that definition has "moved the goalposts" of his own definition. From recent interviews, Hassabis has moved towards a definition that resembles the one from this paper linked here. So there's that. We are all moving the goalposts.

And it's not a recent thing. People did this back in the 80s. There's the famous "As soon as AI does something, it ceases to be AI" or paraphrased "AI is everything that hasn't been done yet".

bossyTeacher 2 days ago|||
> AGI is a system that learns to perform many things at the same time.

What counts as a "thing"? Because arguably some of the deep ANNs pre-transfomers would also qualify as AGI but no one would consider them intelligent (not in the human or animal sense of intelligence).

And you probably don't even need fancy neural networks. Get a RL algorithm and a properly mapped solution space and it will learn to do whatever you want as long as the problem can be mapped.

wahnfrieden 2 days ago||||
Can you cite the Deepmind definition? No Google results for that.
NitpickLawyer 2 days ago||
It's from a documentary that tracks Hassabis' life. I c/p from an old comment of mine (the quotes are from the documentary, can probably look up timestamps if you need, but it's in the first ~15 minutes I'd say, when they cover the first days of Deepmind):

----

In 2010, one of the first "presentations" given at Deepmind by Hassabis, had a few slides on AGI (from the movie/documentary "The Thinking Game"):

Quote from Shane Legg: "Our mission was to build an AGI - an artificial general intelligence, and so that means that we need a system which is general - it doesn't learn to do one specific thing. That's really key part of human intelligence, learn to do many many things".

Quote from Hassabis: "So, what is our mission? We summarise it as <Build the world's first general learning machine>. So we always stress the word general and learning here the key things."

And the key slide (that I think cements the difference between what AGI stood for then, vs. now):

AI - one task vs. AGI - many tasks

at human level intelligence.

darepublic 2 days ago|||
It doesn't play chess? Just can parrot it very well
NitpickLawyer 2 days ago||
Yeah, maybe. But what matters is the end result. In the kaggle match, one of the games from the finals (grok vs o3) is rated by chesscom's stockfish as 1900vs2500. That is, they played a game at around those ratings.

For reference, the average chesscom player is ~900 elo, while the average FIDE rated player is ~1600. So, yeah. Parrot or not, the LLMs can make moves above the average player. Whatever that means.

darepublic 2 days ago||
I believe it will make illegal moves (unaided by any tools ofc). It will also make mistakes doing things like not being able to construct the board correctly given a fen string. For these reasons I consider long strings of correct moves insufficient to say it can play the game. If my first two statements, about a propensity for illegal moves and other fails on "easy for humans" tasks were untrue then I would reconsider.
NitpickLawyer 2 days ago||
In the kaggle test they considered the match forfeit if the model could not produce a legal move after 3 tries (none of the matches in the finals were forfeited, they all ended with checkmate on the board). Again, chesscom's interface won't let you make illegal moves, and the average there is 900. Take that as you will.
vardump 2 days ago||||
No, just what has usually happened in the past with AI goalposts.

At first, just playing chess was considered to be a sign of intelligence. Of course, that was wrong, but not obvious at all in 1950.

krige 2 days ago||
You know, as the saying goes, if a metric becomes a target...
empath75 2 days ago||||
I don't think AGI is a useful concept, but if it exists at all, there's a very good argument that LLMs had it as soon as they could pass the Turing test reliably, which they accomplished years ago at this point.
root_axis 2 days ago||
LLMs do not pass the turing test. It's very easy to know if you're speaking with one.
A4ET8a8uTh0_v2 2 days ago||||
I think, given some of the signs of the horizon, there is a level of MAD type bluffing going around, but some of the actions by various power centers suggest it is either close, people think its close or it is there.
derektank 2 days ago|||
It wasn't the best definition of AGI but I think if you asked an interested layman whether or not a system that can pass the Turing test was AGI 5 years ago, they would have said yes
jltsiren 2 days ago|||
An interested but uninformed layman.

When I was in college ~25 years ago, I took a class on the philosophy of AI. People had come up with a lot of weird ideas about AI, but there was one almost universal conclusion: that the Turing test is not a good test for intelligence.

The least weird objection was that the premise of the Turing test is unscientific. It sees "this system is intelligent" as a logical statement and seeks to prove or disprove it in an abstract model. But if you perform an experiment to determine if a real-world system is intelligent, the right conclusion for the system passing the test is that the system may be intelligent, but a different experiment might show that it's not.

nativeit 2 days ago|||
Douglas Hofstadter wrote Gödel, Escher, Bach nearly 50-years ago, and it won a Pulitzer Prize, the National Book Award, and got featured in the popular press. It’s been on lots of college reading lists, from 2007 online coursework for high school students was available from MIT. The FBI concluded that the 2001 anthrax scare was in-part inspired by elements of the book, which was found in the attacker’s trash. Anyone who’s wanted to engage with the theories and philosophy surrounding artificial intelligence has had plenty of materials that get fairly in-depth asking and exploring these same questions. It seems like a lot of people seem to think this is all bleeding edge novelty (at least, the underlying philosophical and academic ideas getting discussed in popular media), but rather all of the industry is predicated on ideas that are very old philosophy + decades-old established technology + relatively recent neuroscience + modern financial engineering. That said, I don’t want to suggest a layperson is likely to have engaged with any of it, so I understand why this will be the first time a lot of people will have ever considered some of these questions. I imagine what I’m feeling is fairly common to anyone who’s got a very niche interest that blows up and becomes the topic of interest for the entire world. I think there’s probably some very interesting, as-yet undocumented phenomena occurring that’s been the product of the unbelievably vast amount of resources sunk into what’s otherwise a fairly niche kind of utility (in LLMs specifically, and machine learning more broadly). I’m optimistic that there will be some very transformational technologies to come from it, although whether it will produce anything like “AGI”, or ever justify these levels of investment? Both seem rather unlikely.
MattRix 2 days ago|||
Isn’t that the point of trying to define it in a more rigorous way, like this paper is doing?
bigyabai 2 days ago|||
The authors acknowledge that this is entirely possible. Their work is just grounded in theory, after all:

> we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.

righthand 2 days ago|||
I agree if our comprehension of intelligence and “life” is incomplete, so is our model for artificial intelligence.
rafram 2 days ago||
Are you claiming that LLMs have achieved AGI?
moffkalast 2 days ago||
Compared to everything that came before they are fairly general alright.
empath75 2 days ago||
This is a serious paper by serious people and it is worth reading, but any definition of intelligence that depends on human beings as reference will never be a good basis for evaluating non human intelligence.

You could easily write the reverse of this paper that questions whether human beings have general intelligence by listing all the things that LLMs can do, which human beings can't -- for example producing a reasonably accurate summary of a paper in a few seconds or speaking hundreds of different languages with reasonable fluency.

You can always cherry pick stuff that humans are capable that LLMs are not capable of and vice versa, and and I don't think there is any reason to privilege certain capabilities over others.

I personally do not believe that "General Intelligence" exists as a quantifiable feature of reality, whether in humans or machines. It's phlogiston, it's the luminiferous ether. It's a dead metaphor.

I think what is more interesting is focusing on _specific capabilities_ that are lacking and how to solve each of them. I don't think it's at all _cheating_ to supplement LLM's with tool use, RAG, the ability to run python code. If intelligence can be said to exist at all, it is as part of a system, and even human intelligence is not entirely located in the brain, but is distributed throughout the body. Even a lot of what people generally think of as intelligence -- the ability to reason and solve logic and math problems typically requires people to _write stuff down_ -- ie, use external tools and work through a process mechanically.

SirMaster 1 day ago|
I don't think it's really AGI until you can simply task it with creating a new better version of itself and it can succeed in doing that all on its own.

A team of humans can and will make a GPT-6. Can a team of GPT-5 agents make GPT-6 all on its own if you give it the resources necessary to do so?

vages 1 day ago|
This is called Recursive AI, and is briefly mentioned in the paper.
More comments...