Top
Best
New

Posted by sandslash 5 days ago

Fei-Fei Li: Spatial intelligence is the next frontier in AI [video](www.youtube.com)
290 points | 152 commentspage 2
meerab 1 day ago|
View the transcript here

https://videotobe.com/play/youtube/_PioN-CpOP0

alganet 2 days ago||
How can I be sure that spatial intelligence AIs will not be just intricate sensoring that ultimately fails to demonstrate actual intelligence?

> "trilobite"

The trilobite ancestor had a nervous system before it had an eye. It was able to make decisions and interact with the environment before the ability to see or speak a language.

It feels to me like this basic step is still missing. We haven't even crossed the first AI frontier yet.

sabman 3 days ago||
We've been working on this challenge in the satellite domain with https://earthgpt.app. It’s a subset of what Fei-Fei is describing, but comes with its own unique issues like handling multi-resolution sensors and imagery with hundreds of spectral bands. Think of it as computer vision, but in n-dimensions.

Happy to answer questions if you're curious. PS. still in early beta, so please be gentle!

fnands 3 days ago||
Hey, cool project!

Do you actually pass the images to the model, or just the metadata/stats?

sabman 3 days ago||
Thanks! This live demo uses metadata and stats only. Right now we are testing ViTs and Foundation Models as well. But quality of results from EO FMs haven't been worth the inference cost so far. Early days though. Also starting to fine tune models for specific downstream tasks ourselves.
fnands 3 days ago||
Cool, makes sense.

Yeah, have you considered maybe looking into just running it on embeddings [1], instead of the imagery itself? Would save on most of the inference cost, at the cost of flexibility (i.e. you are locked into whatever embeddings have been created).

[1] https://developers.google.com/earth-engine/datasets/catalog/...

sabman 2 days ago||
ah yes we have been testing other embedding models but not google's. I'll try this too. Its interesting most of them are doing land cover classes which is kinda solved already. We are also testing mixing agenic workflows with smaller directed prompts for users to provide the classes. Incidentally we are Berlin based. We should grab a coffee :)
byteab 3 days ago||
Really interesting space
mehulashah 2 days ago||
“Forget about what you’ve done in the past. Forget about what others think of you. Just hunker down and build. That is my comfort zone.”

Enough said.

owenpalmer 3 days ago||
While spatial intelligence is certainly a major limitation of current AI systems, I have been able to get LLMs to do quite impressive things.

Here's an on the fly video I made (no retakes) of Claude generating a Godot scene file.

https://youtu.be/2gARJpDG7Jo?si=W4rlISO-J4EPJYyG

AStrangeMorrow 3 days ago|
Yeah funnily I did a project where we had an LLM based interface to in house 3D parametric modeling system and it did fairly well.

However I’ve been trying to use LLM, both as orchestrators and in other cases to write code for 2D optimization problems with many spatial relationships and it has done terribly.

I have talking it can generate 1000s of lines over many rounds of prompting/iteration that solve maybe 30% of the problem (and the 30% very easy cases) while completely messing up the rest. When doing that code myself, in less than 1000 lines, the “30% part” was maybe 3% of the total code. Even when basically providing pseudo code to solve specific part of the problem chances are these LLM solutions would also have many blind spots or issues.

The thing is, that is a 2D problem for which there basically no ressources about online, and all the slightly similar problems all have careful handcrafted specialized solutions. So I think it has no good frame of reference how to solve the problem

wolframhempel 3 days ago||
we're actually working on a practical implementation of aspects of what Fei-Fei describes - although with a more narrow focus on optimizing operations in the physical space (mining, energy, defense etc) https://hivekit.io/about/our-vision/
yellow_postit 3 days ago||
recent paper on “ How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks” [1]

[1] https://arxiv.org/abs/2507.01955

Nurbek-F 3 days ago||
Isn't it what Karpathy has been advocating for since the early days of Tesla Vision?
czbond 3 days ago||
Thanks for joining the obvious Fei-Fei about 5 years late. Spatial web standards approved by IEEE that have been in the works for years.

https://spatialwebfoundation.org/

sota_pop 2 days ago||
Wow, I watched a presentation of this idea (2) years ago. It reads like classic flavor-of-the-week jargon-soup engineered to be catnip for unsuspecting VC; ie combining buzzwords HTTP, IoT, AI, blockchain, and the notion of “digital twin” from the AEC industry. Given by a guy who seemed extremely excited (and heavily energized - possibly chemically). The presenter tried to describe how this differs from HTTP. I’m highly confident no one in the room was able to make anything of it.

Before any questions could be asked, the presenter said “OK, I need to run to give this presentation at the World Economic Forum in Davos now.”, and quite literally ran out of the room.

hiddencost 3 days ago||
You do know who she is right?
czbond 2 days ago||
Of course.... doesn't mean she was early in forming that viewpoint. Just stating the near future obvious
ninetyninenine 3 days ago|
The next frontier is eliminating hallucinations?

Once that happens it’s all over.

More comments...