ARC-AGI-3 - Hacker News

Posted by lairv 7 hours ago

ARC-AGI-3(arcprize.org)

https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

218 points | 154 commentspage 2

Stevvo 6 hours ago|

Maybe I'm just not intelligent, but I gave it a couple of minutes and couldn't figure out WTF the game wants from you or how to win it.

Barbing 5 hours ago||

It's not about intelligence, Stevvo. Proof, how long did this specific one take me, under a minute to solve the first level ;)

If you've played Wordle you might've solved the game in a minute once before as well. And if you've played a bunch then you've perhaps also taken the entire day to solve it.

So why is it that today’s puzzle was so intuitive but next month’s new puzzle shared here could be impossible. A more satisfying explanation than luck and the obvious “different things are different” (even though… Yeah different things are different)

culi 4 hours ago|||

It's not an IQ test. Just a way to assess your ability to generalize rules. If you've played previous rounds you kinda get used to the "style" of these games and it gets easier

WarmWash 5 hours ago||

Once you figure out one game, it goes a long way towards figuring out all the rest. There are a lot of common general themes.

cedws 5 hours ago||

It's like playing The Witness. Somebody should set LLMs loose on that.

throwaway613746 2 hours ago||

[dead]

EternalFury 2 hours ago||

The real question is: Can it be generated using programs? If it can be, then LLMs will eventually monkey type these programs.

convexly 2 hours ago||

My issue with AGI benchmarks is you can never tell if you're measuring actual capability or just how much the training data overlapped with the test.

ranyume 5 hours ago||

This is an interesting update. And a big challenge for companies and labs. The new tools for measurement are indeed what I'd like out of future agents, and agents that solve the games will need to use different subsystems to do so. This is basically optimization for achieving goals (as opposed to prompt engineering / magic spells to make the LLM do what is told to do) which imo is the future we should aspire to build.

jesse_dot_id 3 hours ago||

At this point, I'm pretty sure we'll just know when it happens.

hatthew 2 hours ago||

I'm not convinced. I wouldn't be surprised if GPT-2 to ChatGPT is the biggest single jump in "machine intelligence" we will ever see. I'd bet all gains in the future will be more incremental, at least until machines surpass humans by a large enough margin that it's difficult to qualify—let alone quantify—how big any given jump is.

Without a big jump, we're just going to boil the frog (ourselves).

neilellis 2 hours ago||

Unless it’s already happened and we missed it

threatripper 2 hours ago||

Or nobody is around anymore to notice when it happens.

spprashant 5 hours ago||

I played the demo, but it definitely took me a minute to grok the rules.

I don't know if this is how we want to measure AGI.

In general I believe the we should probably stop this pursuit for human equivalent intelligence that encourages people to think of these models as human replacements. LLMs are clearly good at a lot of things, lets focus on how we can augment and empower the existing workforce.

esafak 3 hours ago||

> ... lets focus on how we can augment and empower the existing workforce.

That is a nice sentiment but not what the AI companies are out to do; they want your job.

jachee 4 hours ago|||

Also, let's see if we can get the power and compute requirements brought down. Having to spin up a gigawatt power plant to achieve the same intelligence we humans power with sandwiches is a futile approach, imho.

fsdf2 4 hours ago||

Took me about 5 secs to figure it out tbh.

Surprised at the comments here re. not figuring it. Simple game. Super annoying though lmao.

spprashant 3 hours ago||

Its simple, but its not easy is what I would say. Once you figure out the meta, you can work out most of it.

abraxas 5 hours ago||

Even if tomorrow's models get good enough to complete these games we won't be able to proclaim AGI. In the realm of silly computer games alone I'm going on record saying that there are plenty of 8 bit games that AIs will trip on even when this benchmark is crushed. 2D platformers like Manic Miner or Mario need skills that none of these games appear to capture.

WarmWash 5 hours ago||

Captcha's about to get wild.

Maybe the internet will briefly go back to a place mainly populated with outliers.

baron816 5 hours ago|

Looks like I’m generally unintelligent

More comments...