Top
Best
New

Posted by sandslash 7/1/2025

Fei-Fei Li: Spatial intelligence is the next frontier in AI [video](www.youtube.com)
296 points | 153 commentspage 2
myspeed 7/3/2025|
Most of our spatial intelligence is innate, developed through evolution. We're born with a basic sense of gravity and the ability to track objects. When we learn to drive a car, we simply reassign these built-in skills to a new context
pzo 7/3/2025|
Is there any research about it ? This would mean we massing some knowledge in genes and when offspring born have some knowledge of our ancestors. This would mean the weights are stored in DNA?
cma 7/3/2025||
Horses can be blindfolded at birth and when removed do basic navigation with no time for any training. Other non-visually precocious animals like cats, if they miss a critical development period without getting natural vision data, will never develop a functioning visual system.

Baby chicks can do bipedal balance pretty much as soon as they dry off.

Wood ducks can visually imprint very soon after hatching and drying off, a couple hours after birth with very limited visual data up until then and no interspersed sleep cycles.

We as humans have natural reactions to snake like shapes etc. even before encountering the danger of them or learning about it from social cues. Babies

magicalhippo 7/3/2025|||
I've pondered this often, especially kangaroos where the half-developed fetus can climb up into the pouch.

Clearly we're just hardwired for certain tasks, in such a way that the function is primarily dictated by topology.

This weight agnostic neural network page[1] explores this, but obviously isn't the true answer.

[1]: https://weightagnostic.github.io/

jampekka 7/3/2025|||
It's not clear whether humans have natural reactions to snakes. https://link.springer.com/article/10.11133/j.tpr.2013.63.4.0...
cma 7/3/2025||
If not, another visual system one in newborns is preference for faces with open eyes and direct gaze:

https://pubmed.ncbi.nlm.nih.gov/17030037/

alganet 7/3/2025||
How can I be sure that spatial intelligence AIs will not be just intricate sensoring that ultimately fails to demonstrate actual intelligence?

> "trilobite"

The trilobite ancestor had a nervous system before it had an eye. It was able to make decisions and interact with the environment before the ability to see or speak a language.

It feels to me like this basic step is still missing. We haven't even crossed the first AI frontier yet.

mehulashah 7/3/2025||
“Forget about what you’ve done in the past. Forget about what others think of you. Just hunker down and build. That is my comfort zone.”

Enough said.

owenpalmer 7/3/2025||
While spatial intelligence is certainly a major limitation of current AI systems, I have been able to get LLMs to do quite impressive things.

Here's an on the fly video I made (no retakes) of Claude generating a Godot scene file.

https://youtu.be/2gARJpDG7Jo?si=W4rlISO-J4EPJYyG

AStrangeMorrow 7/3/2025|
Yeah funnily I did a project where we had an LLM based interface to in house 3D parametric modeling system and it did fairly well.

However I’ve been trying to use LLM, both as orchestrators and in other cases to write code for 2D optimization problems with many spatial relationships and it has done terribly.

I have talking it can generate 1000s of lines over many rounds of prompting/iteration that solve maybe 30% of the problem (and the 30% very easy cases) while completely messing up the rest. When doing that code myself, in less than 1000 lines, the “30% part” was maybe 3% of the total code. Even when basically providing pseudo code to solve specific part of the problem chances are these LLM solutions would also have many blind spots or issues.

The thing is, that is a 2D problem for which there basically no ressources about online, and all the slightly similar problems all have careful handcrafted specialized solutions. So I think it has no good frame of reference how to solve the problem

wolframhempel 7/3/2025||
we're actually working on a practical implementation of aspects of what Fei-Fei describes - although with a more narrow focus on optimizing operations in the physical space (mining, energy, defense etc) https://hivekit.io/about/our-vision/
meerab 7/4/2025||
View the transcript here

https://videotobe.com/play/youtube/_PioN-CpOP0

yellow_postit 7/3/2025||
recent paper on “ How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks” [1]

[1] https://arxiv.org/abs/2507.01955

Nurbek-F 7/3/2025||
Isn't it what Karpathy has been advocating for since the early days of Tesla Vision?
czbond 7/3/2025||
Thanks for joining the obvious Fei-Fei about 5 years late. Spatial web standards approved by IEEE that have been in the works for years.

https://spatialwebfoundation.org/

sota_pop 7/3/2025||
Wow, I watched a presentation of this idea (2) years ago. It reads like classic flavor-of-the-week jargon-soup engineered to be catnip for unsuspecting VC; ie combining buzzwords HTTP, IoT, AI, blockchain, and the notion of “digital twin” from the AEC industry. Given by a guy who seemed extremely excited (and heavily energized - possibly chemically). The presenter tried to describe how this differs from HTTP. I’m highly confident no one in the room was able to make anything of it.

Before any questions could be asked, the presenter said “OK, I need to run to give this presentation at the World Economic Forum in Davos now.”, and quite literally ran out of the room.

hiddencost 7/3/2025||
You do know who she is right?
czbond 7/3/2025||
Of course.... doesn't mean she was early in forming that viewpoint. Just stating the near future obvious
ninetyninenine 7/3/2025|
The next frontier is eliminating hallucinations?

Once that happens it’s all over.

More comments...