There are no new ideas in AI only new datasets

Posted by bilsbie 14 hours ago

There are no new ideas in AI only new datasets(blog.jxmo.io)

360 points | 180 comments

EternalFury 11 hours ago|

What John Carmack is exploring is pretty revealing. Train models to play 2D video games to a superhuman level, then ask them to play a level they have not seen before or another 2D video game they have not seen before. The transfer function is negative. So, in my definition, no intelligence has been developed, only expertise in a narrow set of tasks.

It’s apparently much easier to scare the masses with visions of ASI, than to build a general intelligence that can pick up a new 2D video game faster than a human being.

ozgrakkurt 3 hours ago||

Seeing comments here saying “this problem is already solved”, “he is just bad at this” etc. feels bad. He has given a long time to this problem by now. He is trying to solve this to advance the field. And needless to say, he is a legend in computer engineering or w/e you call it.

It should be required to point to the “solution” and maybe how it works to say “he just sucks” or “this was solved before”.

IMO the problem with current models is that they don’t learn categorically like: lions are animals, animals are alive. goats are animals, goats are alive too. So if lions have some property like breathing and goats also have it, it is likely that other similar things have the same property.

Or when playing a game, a human can come up with a strategy like: I’ll level this ability and lean on it for starting, then I’ll level this other ability that takes more time to ramp up while using the first one, then change to this play style after I have the new ability ready. This might be formulated completely based on theoretical ideas about the game, and modified as the player gets more experience.

With current AI models as far as I can understand, it will see the whole game as an optimization problem and try to find something at random that makes it win more. This is not as scalable as combining theory and experience in the way that humans do. For example a human is innately capable of understanding there is a concept of early game, and the gains made in early game can compound and generate a large lead. This is pattern matching as well but it is on a higher level .

Theory makes learning more scalable compared to just trying everything and seeing what works

vladimirralev 10 hours ago|||

He is not using appropriate models for this conclusion and neither is he using state of the art models in this research and moreover he doesn't have an expensive foundational model to build upon for 2d games. It's just a fun project.

A serious attempt at video/vision would involve some probabilistic latent space that can be noised in ways that make sense for games in general. I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game. I think you could prompt veo3 to play any game for a few seconds and it will generally make sense even though it is not fine tuned.

sigmoid10 8 hours ago|||

Veo3's world model is still pretty limited. That becomes obvious very fast once you prompt out of distribution video content (i.e. stuff that you are unlikely to find on youtube). It's extremely good at creating photorealistic surfaces and lighting. It even has some reasonably solid understanding of fluid dynamics for simulating water. But for complex human behaviour (in particular certain motions) it simply lacks the training data. Although that's not really a fault of the model and I'm pretty sure there will be a way to overcome this as well. Maybe some kind of physics based simulation as supplement training data.

altairprime 8 hours ago||||

Is any model currently known to succeed in the scenario that Carmack’s inappropriate model failed?

outofpaper 7 hours ago||

No monolithic models but us ng hybrid approaches we've been able to beet humans for some time now.

altairprime 6 hours ago||

To confirm: hybrid approaches can demonstrate competence at newly-created video games within a short period of exposure, so long as similar game mechanics from other games were incorporated into their training set?

317070 8 hours ago||||

What you're thinking of is much more like the Genie model from DeepMind [0]. That one is like Veo, but interactive (but not publically available)

[0] https://deepmind.google/discover/blog/genie-2-a-large-scale-...

Intralexical 4 hours ago||||

> I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game.

In the same way that keeping a dream journal is basically doing investigative journalism, or talking to yourself is equivalent to making new friends, maybe.

The difference is that while they may both produce similar, "plausible" output, one does so as a result of processes that exist in relation to an external reality.

keerthiko 8 hours ago||||

> generating a video under prompt constraints is basically playing a game

Besides static puzzles (like a maze or jigsaw) I don't believe this analogy holds? A model working with prompt constraints that aren't evolving or being added over the course of "navigating" the generation of the model's output means it needs to process 0 new information that it didn't come up with itself — playing a game is different from other generation because it's primarily about reacting to input you didn't know the precise timing/spatial details of, but can learn that they come within a known set of higher order rules. Obviously the more finite/deterministic/predictably probabilistic the video game's solution space, the more it can be inferred from the initial state, aka reduce to the same type of problem as generating a video from a prompt), which is why models are still able to play video games. But as GP pointed out, transfer function negative in such cases — the overarching rules are not predictable enough across disparate genres.

> I think you could prompt veo3 to play any game for a few seconds

I'm curious what your threshold for what constitutes "play any game" is in this claim? If I wrote a script that maps button combinations to average pixel color of a portion of the screen buffer, by what metric(s) would veo3 be "playing" the game more or better than that script "for a few seconds"?

edit: removing knee-jerk reaction language

vladimirralev 8 hours ago|||

It's not ideal, but you can prompt it with an image of a game frame, explain the objects and physics in text and let it generate a few frames of gameplay as a substitute for controller input as well as what it expects as an outcome. I am not talking about real interactive gameplay.

I am just saying we have proof that it can understand complex worlds and sets of rules, and then abide by them. It doesn't know how to use a controller and it doesn't know how to explore the game physics on its own, but those steps are much easier to implement based on how coding agents are able to iterate and explore solutions.

hluska 8 hours ago|||

[flagged]

keerthiko 8 hours ago||

fair, and I edited my choice of words, but if you're reading that much aggression from my initial comment (which contains topical discussion) to say what you did, you must find the internet a far more savage place than it really is :/

hluska 7 hours ago||

[flagged]

troupo 8 hours ago||||

> I think veo3 proves that ai can generalize 2d and even 3d games

It doesn't. And you said it yourself:

> generating a video under prompt constraints is basically playing a game.

No. It's neither generating a game (that people can play) nor is it playing a game (it's generating a video).

Since it's not a model of the world in any sense of the word, there are issues with even the most basic object permanenece. E.g. here's veo3 generating a GTA-style video. Oh look, the car spins 360 and ends up on a completely different street than the one it was driving down previously: https://www.youtube.com/watch?v=ja2PVllZcsI

vladimirralev 7 hours ago||

It is still doing a great job for a few frames, you could keep it more anchored to the state of the game if you prompt it. Much like you can prompt coding agents to keep a log of all decisions previously made. Permanenece is excellent, it slips often but it mostly because it is not grounded to specific game state by the prompt or by the decision log.

pshc 7 hours ago|||

I think we need a spatial/physics model handling movement and tactics watched over by a high level strategy model (maybe an LLM).

YokoZar 10 hours ago|||

I wonder if this is a case of overfitting from allowing the model to grow too large, and if you might cajole it into learning more generic heuristics by putting some constraints on it.

It sounds like the "best" AI without constraint would just be something like a replay of a record speedrun rather than a smaller set of heuristics of getting through a game, though the latter is clearly much more important with unseen content.

smokel 10 hours ago|||

The subject you are referring to is most likely Meta-Reinforcement Learning [1]. It is great that John Carmack is looking into this, but it is not a new field of research.

[1] https://instadeep.com/2021/10/a-simple-introduction-to-meta-...

justanotherjoe 10 hours ago|||

I don't get why people are so invested in framing it this way. I'm sure there are ways to do the stated objective. John Carmack isn't even an AI guy why is he suddenly the standard.

GuB-42 6 hours ago|||

Who is an "AI guy"? The field as we know it is fairly new. Sure, neural nets are old hat, but a lot has happened in the last few years.

John Carmack founded Keen technology in 2022 and has been working seriously on AI since 2019. From his experience in the video game industry, he knows a thing or two about linear algebra and GPUs, that is the underlying maths and the underlying hardware.

So, for all intent and purposes, he is an "AI guy" now.

amelius 6 hours ago||

But the logic seems flawed.

He has built an AI system that fails to do X.

That does not mean there isn't an AI system that can do X. Especially considering that a lot is happening in AI, as you say.

Anyway, Carmack knows a lot about optimizing computations on modern hardware. In practice, that happens to be also necessary for AI. However, it is not __sufficient__ for AI.

nkmnz 3 minutes ago|||

This is exactly how Science works. He’s right until proven wrong. And so are you.

PeeMcGee 14 minutes ago||||

> That does not mean there isn't an AI system that can do X.

You are holding the burden of proof here...

gerdesj 4 hours ago|||

"He has built an AI system that fails to do X."

Perhaps you have put your finger on the fatal flaw ...

qaq 9 hours ago||||

Keen includes researchers like Richard Sutton, Joseph Modayil etc. Also John has being doing it full time for almost 5 years now so given his background and aptitude for learning I would imaging by this time he is more of an AI guy then a fairly large percentage of AI PhDs.

surecoocoocoo 5 hours ago||||

Ah some No True Scotsman

Not sure why justanotherjoe is a credible resource on who is and isn’t expert in some new dialectic and euphemism for machine state management. You’re that nobody to me :shrug:

Yann LeCun is an AI guy and has simplified it as “not much more than physical statistics.”

WWhole lot of AI is decades old info theory books applied to modern computer.

Either a mem value is or isn’t what’s expected. Either an entire matrix of values is or isn’t what’s expected. Store the results of some such rules. There’s your model.

The words are made up and arbitrary because human existence is arbitrary. You’re being sold on a bridge to nowhere.

varjag 9 hours ago||||

What in your opinion constitutes an AI guy?

energy123 4 hours ago||||

Credentialism is bad, especially when used as a stick

raincole 8 hours ago||||

Because it "confirms" what they already believe in.

refulgentis 9 hours ago||||

Names >> all, and increasingly so.

One phenomena that bared this to me, in a substantive way, was noticing an increasing # of reverent comments re: Geohot in odd places here, that are just as quickly replied to by people with a sense of how he works, as opposed to the keywords he associates himself with. But that only happens here AFAIK.

Yapping, or, inducing people to yap about me, unfortunately, is much more salient to my expected mindshare than the work I do.

It's getting claustrophobic intellectually, as a result.

Example from the last week is the phrase "context engineering" - Shopify CEO says he likes it better than prompt engineering, Karpathy QTs to affirm, SimonW writes it up as fait accompli. Now I have to rework my site to not use "prompt engineering" and have a Take™ on "context engineering". Because of a couple tweets + a blog reverberating over 2-3 days.

Nothing against Carmack, or anyone else named, at all. i.e. in the context engineering case, they're just sharing their thoughts in realtime. (i.e. I don't wanna get rolled up into a downvote brigade because it seems like I'm affirming the loose assertion Carmack is "not an AI guy", or, that it seems I'm criticizing anyone's conduct at all)

EDIT: The context engineering example was not in reference to another post at the time of writing, now one is the top of front page.

dvfjsdhgfv 8 hours ago||

> Now I have to rework my site to not use "prompt engineering" and have a Take™ on "context engineering". Because of a couple tweets + a blog reverberating over 2-3 days.

The difference here is that your example shows a trivial statement and a change period of 3 days, whereas what Carmack is doing is taking years.

refulgentis 7 hours ago||

Right. Nothing against Carmack. Grew up on the guy. I haven't looked into, at all, into any of the disputed stuff, and should actively proclaim I'm a yuge Carmack fanboy.

sieabahlpark 9 hours ago|||

[dead]

Uehreka 8 hours ago|||

These questions of whether the model is “really intelligent” or whatever might be of interest to academics theorizing about AGI, but to the vast swaths of people getting useful stuff out of LLMs, it doesn’t really matter. We don’t care if the current path leads to AGI. If the line stopped at Claude 4 I’d still keep using it.

And like I get it, it’s fun to complain about the obnoxious and irrational AGI people. But the discussion about how people are using these things in their everyday lives is way more interesting.

bthornbury 6 hours ago|||

This generalization issue in RL in specific was detailed by OpenAI in 2018

https://arxiv.org/pdf/1804.03720

ferguess_k 10 hours ago|||

Can you please explain "the transfer function is negative"?

I'm wondering whether one has tested with the same model but on two situations:

1) Bring it to superhuman level in game A and then present game B, which is similar to A, to it.

2) Present B to it without presenting A.

If 1) is not significantly better than 2) then maybe it is not carrying much "knowledge", or maybe we simply did not program it correctly.

Zanfa 15 minutes ago|||

According to Carmack's recent talk [0], SOTA models that have been trained on game A don't perform better or train faster on game B. Even worse, training on game B negatively affects performance in game A when returning to it.

[0] https://www.youtube.com/watch?v=3pdlTMdo7pY

tough 10 hours ago|||

I think the problem is we train models to pattern match, not to learn or reason about world models

magicalhippo 4 hours ago|||

In the Physics of Language Models[1] they argue that you must augment your training data by changing sentences and such, in order for the model to be able to learn the knowledge. As I understand their argument, language models don't have a built-in way to detect what is important information and what is not, unlike us. Thus the training data must aid it by presenting important information in many different ways.

Doesn't seem unreasonable that the same holds in a gaming setting, that one should train on many variations of each level. Change the lengths of halls connecting rooms, change the appearance of each room, change power-up locations etc, and maybe even remove passages connecting rooms.

[1]: https://physics.allen-zhu.com/part-3-knowledge/part-3-1

singron 10 hours ago||||

I think this is clearly a case of over fitting and failure to generalize, which are really well understood concepts. We don't have to philosophize about what pattern matching really means.

NBJack 10 hours ago||||

In other words, they learn the game, not how to play games.

fsmv 10 hours ago|||

They memorize the answers not the process to arrive at answers

EternalFury 8 hours ago|||

They learn the value of specific actions in specific contexts based on the rewards they received during their play time. Specific actions and specific contexts are not transferable for various reasons. John quoted that varying frame rates and variable latency between action and effect really confuse the models.

nightpool 7 hours ago||

Okay, so fuzz the frame rate and latency? That feels very easy to fix.

wredcoll 5 hours ago||

Good point, you should write to John Carmack and let him know you've figured out the problem.

IshKebab 10 hours ago|||

This has been disproven so many times... They clearly do both. You can trivially prove this yourself.

0xWTF 10 hours ago||

> You can trivially prove this yourself.

Given the long list of dead philosophers of mind, if you have a trivial proof, would you mind providing a link?

pdabbadabba 9 hours ago|||

It’s really easy: go to Claude and ask it a novel question. It will generally reason its way to a perfectly good answer even if there is no direct example of it in the training data.

keerthiko 8 hours ago|||

When LLM's come up with answers to questions that aren't directly exampled in the training data, that's not proof at all that it reasoned its way there — it can very much still be pattern matching without insight from the actual code execution of the answer generation.

If we were taking a walk and you asked me for an explanation for a mathematical concept I have not actually studied, I am fully capable of hazarding a casual guess based on the other topics I have studied within seconds. This is the default approach of an LLM, except with much greater breadth and recall of studied topics than I, as a human, have.

This would be very different than if we sat down at a library and I applied the various concepts and theorems I already knew to make inferences, built upon them, and then derived an understanding based on reasoning of the steps I took (often after backtracking from several reasoning dead ends) before providing the explanation.

If you ask an LLM to explain their reasoning, it's unclear whether it just guessed the explanation and reasoning too, or if that was actually the set of steps it took to get to the first answer they gave you. This is why LLMs are able to correct themselves after claiming strawberry has 2 rs, but when providing (guessing again) their explanations they make more "relevant" guesses.

IshKebab 8 hours ago||

LLMs clearly don't reason in the same way that humans or SMT solvers do. That doesn't mean they aren't reasoning.

MichaelZuo 8 hours ago|||

How do you know it’s a novel question?

hackinthebochs 5 hours ago|||

You have probably seen examples of LLMs doing the "mirror test", i.e. identifying themselves in screenshots and referring to the screenshot from the first person. That is a genuinely novel question as an "LLM mirror test" wasn't a concept that existed before about a year ago.

MichaelZuo 5 hours ago||

Elephant mirror tests existed, so it doesn’t seem all that novel when the word “elephant” could just be substituted for the word “LLM”?

hackinthebochs 4 hours ago||

The question isn't about universal novelty, but whether the prompt/context is novel enough such that the LLM answering competently demonstrates understanding. The claim of parroting is that the dataset contains a near exact duplicate of any prompt and so the LLM demonstrating what appears to be competence is really just memorization. But if an LLM can generalize from an elephant mirror test to an LLM mirror test in an entirely new context (showing pictures and being asked to describe it), that demonstrates sufficient generalization to "understand" the concept of a mirror test.

IshKebab 8 hours ago|||

It's not exactly difficult to come up with a question that's so unusual the chance of it being in the training set is effectively zero.

MichaelZuo 7 hours ago|||

Can you provide some examples of these genuinely unique questions?

troupo 8 hours ago|||

And as any programmer will tell you: they immediately devolve into "hallucinating" answers, not trying to actually reason about the world. Because that's what they do: they create statistically plausible answers even if those answers are complete nonsense.

IshKebab 8 hours ago|||

Just go and ask ChatGPT or Claude something that can't possibly be in its training set. Make something up. If it is only memorising answers then it will be impossible for it to get the correct result.

A simple nonsense programming task would suffice. For example "write a Python function to erase every character from a string unless either of its adjacent characters are also adjacent to it in the alphabet. The string only contains lowercase a-z"

That task isn't anywhere in its training set so they can't memorise the answer. But I bet ChatGPT and Claude can still do it.

Honestly this is sooooo obvious to anyone that has used these tools, it's really insane that people are still parroting (heh) the "it just memorises" line.

imiric 8 hours ago|||

LLMs don't "memorize" concepts like humans do. They generate output based on token patterns in their training data. So instead of having to be trained on every possible problem, they can still generate output that solves it by referencing the most probable combination of tokens for the specified input tokens. To humans this seems like they're truly solving novel problems, but it's merely a trick of statistics. These tools can reference and generate patterns that no human ever could. This is what makes them useful and powerful, but I would argue not intelligent.

troupo 8 hours ago|||

People who say that LLMs memorize stuff are just as clueless who assume that there's any reasoning happening.

They generate statistically plausible answers (to simplify the answer) based on the training set and weights they have.

Tijdreiziger 5 hours ago||

What if that’s all we’re doing, though?

IshKebab 10 hours ago||||

Well yeah... If you only ever played one game in your life you would probably be pretty shit at other games too. This does not seem very revealing to me.

trainerxr50 7 hours ago||

I am decent at chess but barely know how the pieces in Go move.

Of course, this because I have spent a lot of time TRAINING to play chess and basically none training to play go.

I am good on guitar because I started training young but can't play the flute or piano to save my life.

Most complicated skills have basically no transfer or carry over other than knowing how to train on a new skill.

beefnugs 9 hours ago|||

yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them?

I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result"

There is some kind of nested multidimensional thing to train on here instead of immediate limited choices

ferguess_k 10 hours ago||||

I kinda think I'm more or less the same...OK maybe we have different definitions of "pattern matching".

veqz 9 hours ago||

It's Plato's cave:

We train the models on what are basically shadows, and they learn how to pattern match the shadows.

But the shadows are only depictions of the real world, and the LLMs never learn about that.

EternalFury 8 hours ago||

100%

antisthenes 10 hours ago||||

Where do you draw the line between pattern matching and reasoning about world models?

A lot of intelligence is just pattern matching and being quick about it.

halfcat 6 hours ago||

The line is: building an internal world model requires interfacing with the world, not a model of it, and subsequent failing (including death and survivorship over generations) and adaptation. Plus pattern matching.

Current AI only does one of those (pattern matching, not evolution), and the prospects of simulating evolution is kind of bleak, given I don’t think we can simulate a full living cell yet from scratch? Building a world model requires life (or something that has undergone a similar evolutionary survivorship path), not something that mimics life.

fullshark 8 hours ago|||

Just sounds like an example of overfitting. This is all machine learning at its root.

SquibblesRedux 2 hours ago|||

Indeed, it's nothing but function fitting.

goatlover 9 hours ago|||

I've wondered about the claim that the models played those Atari/2D video games at superhuman levels, because I clearly recall some humans achieving superhuman levels before models were capable of it. Must have been superhuman compared to average human player, not someone who spent an inordinate amount of time mastering the game.

raincole 8 hours ago||

I'm not sure why you think so. AI outperforms humans in many games already. Basically all the games we care to put money to train a model.

AI has beat the best human players in Chess, Go, Mahjong, Texas hold'em, Dota, Starcraft, etc. It would be really, really surprising that some Atari game is the holy grail of human performance that AI cannot beat.

tsimionescu 8 hours ago||

I recall this not being true at all for Dota and Starcraft. I recall AlphaStar performed much better than the top non-pro players, but it couldn't consistently beat the pro players with the budget that Google was willing to spend, and I believe the same was true of Dota II (and there they were even playing a limited form of the game, with fewer heroes and without the hero choice part, I believe).

wredcoll 5 hours ago||

As I recall, the Starcraft ones heavily involved being able to exploit the computer's advantage in "twitch" speed over any human, it's just a slightly more complicated way of how any aim-bot enabled AI will always beat a human in an FPS, the game is designed to reward a certain amount of physical speed and accuracy.

In other words, the Starcraft AIs that win do so by microing every single unit in the entire game at the same time, which is pretty clever, but if you reduce them to interfacing with the game in the same way a human does, they start losing.

One of my pet peeves when we talk about the various chess engines is yes, given a board state they can output the next set of moves to beat any human, but can they teach someone else to play chess? I'm not trying to activate some kinda "gotcha" here, just getting at what does it actually mean to "know how to play chess". We'd expect any human that claimed to know how to play to be able to teach any other human pretty trivially.

hluska 8 hours ago|||

When I finished my degree, the idea that a software system could develop that level of expertise was relegated to science fiction. It is an unbelievable human accomplishment to get to that point and honestly, a bit of awe makes life more pleasant.

Less quality of life focused, I don’t believe that the models he uses for this research are capable of more. Is it really that revealing?

moralestapia 10 hours ago|||

I wonder how much performance decreases if they just use slightly modified versions of the same game. Like a different color scheme, or a couple different sprites.

t55 10 hours ago||

this is what deepmind did 10 years ago lol

smokel 8 hours ago||

No, they (and many others before them) are genuinely trying to improve on the original research.

The original paper "Playing Atari with Deep Reinforcement Learning" (2013) from Deepmind describes how agents can play Atari games, but these agents would have to be specifically trained on every individual game using millions of frames. To accomplish this, simulators were run in parallel, and much faster than in real-time.

Also, additional trickery was added to extract a reward signal from the games, and there is some minor cheating on supplying inputs.

What Carmack (and others before him) is interested in, is trying to learn in a real-life setting, similar to how humans learn.

voxleone 11 hours ago||

I'd say with confidence: we're living in the early days. AI has made jaw-dropping progress in two major domains: language and vision. With large language models (LLMs) like GPT-4 and Claude, and vision models like CLIP and DALL·E, we've seen machines that can generate poetry, write code, describe photos, and even hold eerily humanlike conversations.

But as impressive as this is, it’s easy to lose sight of the bigger picture: we’ve only scratched the surface of what artificial intelligence could be — because we’ve only scaled two modalities: text and images.

That’s like saying we’ve modeled human intelligence by mastering reading and eyesight, while ignoring touch, taste, smell, motion, memory, emotion, and everything else that makes our cognition rich, embodied, and contextual.

Human intelligence is multimodal. We make sense of the world through:

Touch (the texture of a surface, the feedback of pressure, the warmth of skin0; Smell and taste (deeply tied to memory, danger, pleasure, and even creativity); Proprioception (the sense of where your body is in space — how you move and balance); Emotional and internal states (hunger, pain, comfort, fear, motivation).

None of these are captured by current LLMs or vision transformers. Not even close. And yet, our cognitive lives depend on them.

Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.

The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.

dinfinity 11 hours ago||

> Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.

I respectfully disagree. Touch gives pretty cool skills, but language, video and audio are all that are needed for all online interactions. We use touch for typing and pointing, but that is only because we don't have a more efficient and effective interface.

Now I'm not saying that all other senses are uninteresting. Integrating touch, extensive proprioception, and olfaction is going to unlock a lot of 'real world' behavior, but your comment was specifically about intelligence.

Compare humans to apes and other animals and the thing that sets us apart is definitely not in the 'remaining' senses, but firmly in the realm of audio, video and language.

voxleone 10 hours ago||

> Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.

I probably made a mistake when i asserted that -- should have thought it over. Vision is evolutionarily older and more “primitive”, while language is uniquely human [or maybe, more broadly, primate, cetacean, cephalopod, avian...] symbolic, and abstract — arguably a different order of cognition altogether. But i maintain that each and every sense is important as far as human cognition -- and its replication -- is concerned.

dinfinity 26 minutes ago|||

Vision is interesting in that it leverages the maximum speed with which it is easily possible to gather information about our surroundings in this universe. I believe that is what makes it special and very valuable. I also believe this aspect makes it a strong attractor for convergent evolution.

Language allows encoding and compression of information about the world, which is of course incredibly powerful and increases communication bandwidth enormously (as well as tons of other stuff).

I'd say that for high level cognitive processes, hearing and speaking were an important stepping stone because for some reason evolving organs that can generate relatively high bandwidth signals in audio seems to be easier than evolving something that does that for visuals (very few Teletubby screens on tummies in the natural world).

Interesting games to think about in this sense: Pictionary/drawing games and charades.

wizzwizz4 9 hours ago|||

People who lack one of those senses, or even two of them, tend to do just fine.

oasisaimlessly 6 hours ago||

Mostly thanks to other humans helping them.

If all humans lacked vision, the human race would definitely not do just fine.

mr_world 9 hours ago|||

Organic adaption and persistence of memory I would say are the two major advancements that need to happen.

Human neural networks are dynamic, they change and rearrange, grow and sever. An LLM is fixed and relies on context, if you give it the right answer it won't "learn" that is the correct answer unless it is fed back into the system and trained over months. What if it's only the right answer for a limited period of time?

To build an intelligent machine, it must be able train itself in real time and remember.

specialist 7 hours ago||

Yes and: and forget.

chasd00 10 hours ago|||

> Language and vision are just the beginning..

Based on the architectures we have they may also be the ending. There’s been a lot of news in the past couple years about LLMs but has there been any breakthroughs making headlines anywhere else in AI?

dragonwriter 10 hours ago|||

> There’s been a lot of news in the past couple years about LLMs but has there been any breakthroughs making headlines anywhere else in AI?

Yeah, lots of stuff tied to robotics, for instance; this overlaps with vision, but the advances go beyond vision.

Audio has seen quite a bit. And I imagine there is stuff happening in niche areas that just aren't as publicly interesting as language, vision/imagery, audio, and robotics.

nomel 9 hours ago||||

Two Nobel prizes in chemistry: https://www.nature.com/articles/s41746-024-01345-9

edanm 9 hours ago|||

Sure. In physics, math, chemistry, biology. To name a few.

Swizec 11 hours ago|||

> The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.

Like Dr. Who said: DALEKs aren't brains in a machine, they are the machine!

Same is true for humans. We really are the whole body, we're not just driving it around.

nomel 9 hours ago||

There are many people who mentally developed while paralyzed that literally drive around their bodies via motorized wheelchair. I don't think there's any evidence that a brain couldn't exist or develop in a jar, given only the inputs modern AI now has (text, video, audio).

Swizec 9 hours ago||

> any evidence that a brain couldn't exist or develop in a jar

The brain could. Of course it could. It's just a signals processing machine.

But would it be missing anything we consider core to the way humans think? Would it struggle with parts of cognition?

For example: experiments were done with cats growing up in environments with vertical lines only. They were then put in a normal room and had a hard time understanding flat surfaces.

https://computervisionblog.wordpress.com/2013/06/01/cats-and...

nomel 8 hours ago|||

This isn't remotely a hypothetical, so I imagine there are some examples out there, especially from back when polio was a problem. Although, for practical reasons, they might have had limited exposure to novelty, which could have negative consequences.

Swizec 3 hours ago||

I agree it’s not hypothetical and also as a layperson I don’t know how much impact on cognition has been studied. Would be cool if it has!

I do know of studies that showed blind people start using their visual cortex to process sounds. That is pretty cool imo

skydhash 11 hours ago|||

Yeah, but are there new ideas or only wishes?

jdgoesmarching 10 hours ago||

It’s pure magical thinking that would be correctly dismissed if it didn’t have AI attached to it. Imagine talking this way about anything else.

“We’ve barely scratched the surface with Rust, so far we’re only focused on code and haven’t even explored building mansions or ending world hunger”

tim333 7 hours ago||

AI has some real possibilities of building mansions and ending hunger in a way that Rust doesn't.

timewizard 5 hours ago||

> has made jaw-dropping progress

They took 1970s dead tech and deployed it on machines 1 million times more powerful. I'm not sure I'd qualify this as progress. I'd also need an explanation as to what systemic improvements in models and computations that give an exponential growth in performance are planned.

I don't see anything.

ekunazanu 2 hours ago|||

Winning two Nobel prizes wasn't enough progress?

petesergeant 56 minutes ago|||

> They took 1970s dead tech and deployed it on machines 1 million times more powerful. I’m not sure I’d qualify this as progress

If this isn’t meant to be sarcasm or irony, you’ve got some really exciting research and learning ahead of you! At the moment it reads very “computers are just addition and multiplication and we’ve had that for thousands of years!”

tippytippytango 12 hours ago||

Sometimes we get confused by the difference between technological and scientific progress. When science makes progress it unlocks new S-curves that progress at an incredible pace until you get into the diminishing returns region. People complain of slowing progress but it was always slow, you just didn’t notice that nothing new was happening during the exponential take off of the S-curve, just furious optimization.

baxtr 9 hours ago||

Fully agree.

And at the same time I have noticed that people don’t understand the difference between an S-curve and an exponential function. They can look almost identical at certain intervals.

protocolture 2 hours ago|||

As far back as 2017 I copped a lot of flak for suggesting that the coming automation revolution will be great at copying office workers and artists but wont be in order of replacing the whole human race. A lot of the time moores law got thrown back in my face. But thats how this works, we unlock something new, we exploit it as far as possible, the shine wears off and we deal with the aftermath.

Zacharias030 2 hours ago|||

The crypto mind cannot comprehend

timewizard 5 hours ago|||

You're being awfully generous to describe basic hype as "technological progress."

pevansgreenwood 4 hours ago||

[dead]

strangescript 9 hours ago||

If you work with model architecture and read papers, how could not know there are a flood of new ideas? Only few yield interesting results though.

I kind of wonder if libraries like pytorch have hurt experimental development. So many basic concepts no one thinks about anymore because they just use the out of the box solutions. And maybe those solutions are great and those parts are "solved", but I am not sure. How many models are using someone else's tokenizer, or someone else's strapped on vision model just to check a box in the model card?

delifue 15 minutes ago||

The hardware(GPU)'s architectural limitations may slow research more than PyTorch. The hardware lottery https://hardwarelottery.github.io/

thenaturalist 8 hours ago|||

That's been the very normal way of the human world.

When the foundation layer at a given moment doesn't yield an ROI on intellectual exploration - say because you can overcompensate with VC funded raw compute and make more progess elsewhere -, few(er) will go there.

But inevitably, as other domains reach diminishing returns, bright minds will take a look around where significant gains for their effort can be found.

And so will the next generation of PyTorch or foundational technologies evolve.

kevmo314 8 hours ago|||

The people who don't think about such things probably wouldn't develop experimentally sans pytorch either.

mardifoufs 5 hours ago|||

Yeah and even then, it's been like ~ 2-3 years since the last rather major Architectural improvement, major enough for a lot of people to actually hear about it and use it daily. I think some people lose perspective on how short of a time frame 3 years is.

But yes, there's a ton of interesting and useful stuff (beyond datasets and data related improvements) going on right now, and I'm not even talking about LLMs. I don't do anything related to LLM and even then I still see tons of new stuff popping up regularly.

_giorgio_ 5 hours ago||

It's the opposite.

Frameworks like pytorch are really flexible. You can implement any architecture, and if it's not enough, you can learn CUDA.

Keras it's the opposite, it's probably like you describe things.

kogus 12 hours ago||

To be fair, if you imagine a system that successfully reproduced human intelligence, then 'changing datasets' would probably be a fair summary of what it would take to have different models. After all, our own memories, training, education, background, etc are a very large component of our own problem solving abilities.

jschveibinz 12 hours ago||

I will respectfully disagree. All "new" ideas come from old ideas. AI is a tool to access old ideas with speed and with new perspectives that hasn't been available up until now.

Innovation is in the cracks: recognition of holes, intersections, tangents, etc. on old ideas. It has bent said that innovation is done on the shoulders of giants.

So AI can be an express elevator up to an army of giant's shoulders? It all depends on how you use the tools.

alfalfasprout 12 hours ago||

Access old ideas? Yes. With new perspectives? Not necessarily. An LLM may be able to assist in interpreting data with new perspectives but in practice they're still fairly bad at greenfield work.

As with most things, the truth lies somewhere in the middle. LLMs can be helpful as a way of accelerating certain kinds and certain aspects of research but not others.

stevep98 10 hours ago||

> Access old ideas? Yes. With new perspectives?

I wonder if we can mine patent databases for old ideas that never worked out in the past, but now are more useful. Perhaps due to modern machining or newer materials or just new applications of the idea.

baxtr 9 hours ago|||

Imagine a human had read every book/publication in every field of knowledge that mankind has ever produced AND couldn’t come up with anything entirely new. Hard to imagine.

melagonster 27 minutes ago|||

The difficult part is proposing an experiment to check a new idea.

mdaniel 3 hours ago||||

My hypothesis of the mismatch is centered around "read" - I think that when you wrote it, and when others similarly think about that scenario, the surprise is because our version of "read" is the implied "read and internalized" or at bare minimum "read for comprehension" but as very best I can tell the LLM's version is "encoded tokens into vector space" and not "encoded into semantic graph"

I welcome the hair-splittery that is sure to follow about what it means to "understand" anything

hugh-avherald 3 hours ago|||

It is possible that such a human wouldn't come up with anything new, even if they could.

bcrosby95 11 hours ago|||

The article is discussing working in AI innovation vs focusing on getting more and better data. And while there have been key breakthroughs in new ideas, one of the best ways to increase the performance of these systems is getting more and better data. And how many people think data is the primary avenue to improvement.

It reminds me of an AI talk a few decades ago, about how the cycle goes: more data -> more layers -> repeat...

Anyways, I'm not sure how your comment relates to these two avenues of improvement.

jjtheblunt 11 hours ago|||

> I will respectfully disagree. All "new" ideas come from old ideas.

The insight into the structure of the benzene ring famously came in a dream, hadn't been seen before, but was imagined as a snake bitings its own tail.

troupo 8 hours ago||

And as we all know, it came in a dream to a complete novice in chemistry with zero knowledge of any old ideas in chemistry: https://en.wikipedia.org/wiki/August_Kekul%C3%A9

--- start quote ---

The empirical formula for benzene had been long known, but its highly unsaturated structure was a challenge to determine. Archibald Scott Couper in 1858 and Joseph Loschmidt in 1861 suggested possible structures that contained multiple double bonds or multiple rings, but the study of aromatic compounds was in its earliest years, and too little evidence was then available to help chemists decide on any particular structure.

More evidence was available by 1865, especially regarding the relationships of aromatic isomers.

[ Kekule claimed to have had the dream in 1865 ]

--- end quote ---

The dream claim came from Kekule himself 25 years after his proposal that he had to modify 10 years after he proposed it.

gametorch 12 hours ago||

Exactly!

Can you imagine if we applied the same gatekeeping logic to science?

Imagine you weren't allowed to use someone else's scientific work or any derivative of it.

We would make no progress.

The only legitimate defense I have ever seen here revolves around IP and copyright infringement, which I couldn't care less about.

Night_Thastus 9 hours ago||

Man I can't wait for this '''''AI''''' stuff to blow over. The back and forth gets a bit exhausting.

cadamsdotcom 7 hours ago||

What about actively obtained data - models seeking data, rather than being fed. Human babies put things in their mouths, they try to stand and fall over. They “do stuff” to learn what works. Right now we’re just telling models what works.

What about simulation: models can make 3D objects so why not give them a physics simulator? We have amazing high fidelity (and low cost!) game engines that would be a great building block.

What about rumination: behind every Cursor rule for example, is a whole story of why a user added it. Why not take the rule, ask a reasoning model to hypothesize about why that rule was created, and add that rumination (along with the rule) to the training data. Providing opportunities to reflect on the choices made by their users might deepen any insights, squeezing more juice out of the data.

Centigonal 7 hours ago||

Simulation and embodied AI (putting the AI in a robotic arm or a car so it can try stuff and gather information about the results) are very actively being explored.

cadamsdotcom 6 hours ago||

What about at inference time? ie. in response to a query.

We let models write code and run it. Which gives them a high chance of getting arithmetic right.

Solving the “crossing the river” problem by letting the model create and run a simulation would give a pretty high chance of getting it right.

kevmo314 7 hours ago||

That would be reinforcement learning. The juice is quite hard to squeeze.

cadamsdotcom 6 hours ago||

Agreed for most cases.

Each Cursor rule is a byproduct of tons of work and probably contains lots that can be unpacked. Any research on that?

kevmo314 51 minutes ago||

Yeah, at a very high level it's similar to an actor-critic reinforcement learning algorithm. The rule text is a value function and one could build a critic model that takes as input the rule text and the main model's (the actor's) output to produce a reward.

This is easier said than done though because this value function is so noisy it's often hard to learn from it. And also whether or not a response (the model output) matches the value function (the Cursor rules) is not even that easy to grade. It's been easier to train the chain-of-thought style reasoning since one can directly score it via the length of thinking.

This new paper covers some of the difficulties of language-based critic models: https://openreview.net/pdf?id=0tXmtd0vZG

Generally speaking, the algorithm and approach is not new. Being able to do it in a reasonable amount of compute is the new part.

ks2048 11 hours ago||

The latest LLMs are simply multiplying and adding various numbers together... Babylonians were doing that 4000 years ago.

bobson381 11 hours ago||

You are just a lot of interactions of waves. All meaning is assigned. I prefer to think of this like the Goedel generator that found new formal expressions for the Principia - because we have a way of indexing concept-space, there's no telling what we might find in the gaps.

thenaturalist 8 hours ago||

But on clay tables, not in semi-conductive electron prisons separated by one-atom-thick walls.

Slight difference to those methods, wouldn't you agree?

geysersam 11 minutes ago||

No it's exactly the same. Everything old is new again...

ctoth 12 hours ago|

Reinforcement learning from self-play/AlphaWhatever? Nah must just be datasets. :)

NitpickLawyer 12 hours ago||

And architecture stuff like actually useful long context. Whatever they did with gemini 2.5 is miles ahead in long context useful results compared to the previous models. I'd be very surprised if gemini 2.5 is "just" gemini 1 w/ better data.

shwouchk 7 hours ago||

i dont know what all the hype is with gemini 2.5, at least the currently running instance. from my experience at least in conversation mode, it cannot remember my instructions to avoid apologies and similar platitudes from either the “persona”, personal instructions, or from ine message to the next.

grumpopotamus 12 hours ago|||

https://en.wikipedia.org/wiki/TD-Gammon

zahlman 9 hours ago|||

For that matter, https://gist.github.com/deebs67/8fbcf8b127a63e70d4a3f8590c97... .

Y_Y 11 hours ago|||

You raise a really interesting point. I'm sure it's just missed my notice, but I'm not familiar with any projects from antediluvian AI that have been resurrected to run on modern hardware and see where they'd really asymptote if they'd had the compute they deserved.

rsfern 1 hour ago|||

This paper “were RNNs all we needed?” explores this hypothesis a bit, finding that some pre-transformer sequence models can match transformers when trained at appropriate scale. Though they did have to make some modifications to unlock more parallelism

https://arxiv.org/abs/2410.01201

FeepingCreature 11 hours ago|||

To be fair, usually those projects would need considerable work to be ported to modern multicore machines, let alone GPUs.

genewitch 10 hours ago||

can you name a couple so i can see how much work is involved? markov chains compile fast and respond fast, sure, and neural nets train pretty quick too, so i'm wondering where the cutoff is; expert systems?

energy123 4 hours ago|||

Self-play gives you a large explosion of data.

nyrikki 11 hours ago||

Big difference between a perfect information, completely specified zero sum game and the real world.

As a simple analogy, read out the following sentence multiple times, stressing a different word each time.

"I never said she stole my money"

Note how the meaning changes and is often unique?

That is a lens I to the frame problem and it's inverse, the specification problem.

The above problem quickly becomes tower-complete, and recent studies suggest that RL is reinforcing or increasing the weight of existing patterns.

As the open domain frame problem and similar challenges are equivalent to HALT, finding new ways to extract useful information will be important for generalization IMHO.

Synthetic data is useful, but not a complete solution, especially for tower problems.

genewitch 10 hours ago||

The one we use is "I always pay my taxes"

and as far as synthetic vs real data, there's a lot of gaps in LLM knowledge; and vision models suffer from "limited tags", which used to have workarounds with textual embeddings and the like, but those went by the wayside as LoRA, controlnet, etc. appeared.

There's people who are fairly well known that LLMs have no idea about. There's things in books i own that the AI confidently tells me are either wrong or don't exist.

That one page about compressing 1 gig wikipedia as small as possible implicitly and explicitly states that AI is "basically compression" - and if the data isn't there, it's not in the compressed set (weights) either.

And i'll reply to another comment here, about "24/7 rolling/ for looped" AI - i thought of doing this when i first found out about LLMs, but context windows are the enemy, here. I have a couple of ideas about how to have a continuous AI, but i don't have the capital to test it out.

More comments...