Marble: A Multimodal World Model

Posted by meetpateltech 11/12/2025

Marble: A Multimodal World Model(www.worldlabs.ai)

267 points | 83 commentspage 2

abixb 11/13/2025|

As someone with barebones understanding of "world models," how does this differ from sophisticated game engines that generate three-dimensional worlds? Is it simply the adaptation of transformer architecture in generating the 3-D world v/s using a static/predictable script as in game engines (learned dynamics vs deterministic simulation mimicking 'generation')? Would love an explanation from SMEs.

whizzter 11/13/2025||

Games are still mostly polygon based due to tooling (Even Unreal Nanite is a special variation of handling polygons), some engines have tried voxels (Teardown, Minecraft genererates polygons and would fall in the previous category as far as rendering goes) or even implict surface modes by composing SDF'y primitives (Dreams on Playstation and more recently unbound.io).

All of these have fairly "exact" representations, and generation techniques are also often fairly "exact" in trying to create worlds that won't break physics engines(big part) or rendering engines, often hand-crafted algorithms but nothing really that really stopped neural networks from being used on a higher level.

One important detail in most generation systems in games is that they are often built to be controllable to work with game-logic (think how Minecraft generates the world to include biomes,villages,etc) or more or less artist controllable.

3d scanning has often relied on point-clouds, but were heavy, full of holes,etc and have been infeasible for direct rendering for long so many methods were developed to make decent polygon meshes.

Nerf's and Gaussian splatting(GS) started appearing a few years back, these are more "approximate" and totally ignore polygon generation instead relying on quantization of the world into NN-matrix-"fields"(NERF) or fuzzy-point-clouds (GS), visually these have been impressive since they managed to capture "real" images well.

This system is built on GS since that probably meshed fairly well with neural network token and diffusion techniques for encoding inputs (images, texts).

They do mention mesh exports (there has been some research into polygon generation from GS).

If the system scales to huge worlds this could change game-dev, and there seems to be some aim with the control methods, but it'd probably require more control and world/asset management since you need predictability with existing things to produce in the long term (same as with code agents).

ehnto 11/13/2025|||

Your later point is what makes me think this doesn't have comprehensive legs, just niche usage.

A typical game has thousands of hand placed nodes in 3D space, that do things like place lights, trigger story beats, account for physics and collisions etc. That wouldn't change with Gaussian splats, but if you needed to edit the world then even with deterministic generation, the whole world might change, and all your gameplay nodes are now misplaced.

That doesn't matter for some games, but I think it does matter for most.

whizzter 11/13/2025||

Oh I agree fully, this is probably more created by researchers and/or "AI-bros" with less experience as actual game developers (that they have actually added a way of placing objects is after all far more than most other tools has provided with their text-focus).

That said, all those collisions, triggers, lights, etc could be authored together with blockouts in Unity, Godot or some other editor capable of creating levels that integrates with the rest of the game authoring process.

If they create a way to keep the contexts of generation (or rebuild them from marker objects with prompts that are kept in the level editor and continiously re-imported) and allow for a sane way to re-generate and keep chunks then I feel that this could be fairly bad for world artists (Yes, they'd probably still be needed to adjust things to not look like total slop).

AlexisArgyriou 11/13/2025||||

You could in theory combine point clouds and Nanite: cull sub-pixel points and generate geometry on the fly by filling the voids between remaining points with polygons. The main issue is bandwidth, GPUs are barely able to handle Nanite; and this would be at least an order of magnitude more complex to do at runtime. Nanite is doing a lot of offline precomputation, storing some sort of intermediate models etc.

whizzter 11/13/2025||

I agree, but I don't think this work is for realtime creation (like those Google models) but rather offline authoring. So the fixups can be done later.

corimaith 11/13/2025|||

What does the Gaussian approach do that resolves the issue with voxel engines? I recall if you wanted to start doing animation it becomes a mess of computational complexity.

whizzter 11/13/2025||

GS does 2 things that makes it great for _rendering_ and _world approximation_, it's a view-dependent "fuzzy" thing, so rendering-wise you don't need to fill in blanks of reconstruction, they also encodes view dependent things like reflections (that should help an AI model infer beyond-view details).

The issue of real voxels (not MC style) is that they fill in fixed spaces that then can creates gaps once you start animating, you probably have the same issues with GS (but that's probably why they are doing exports).

mountainriver 11/13/2025|||

The model is predicting what the state of the world would look like after a given action.

Along with entertainment, they can be used for simulation training for robots. And allow for imagining potential trajectories

echelon 11/13/2025|||

Marble is not that type of world model. It generates static Gaussian Splat assets that you can render using 3D libraries.

ghayes 11/13/2025|||

Whenever I see these and play with models like this (and the demos on this page), the movement in the world always feel like a dolly zoom. Things in the distance tend to stay in the distance, even as the camera moves in that direction, and only the local area changes features.

[0] https://en.wikipedia.org/wiki/Dolly_zoom

imtringued 11/13/2025|||

That's the thing about this. Calling things "world models" is only done to confuse people, because "world" is such a loose word. In this scenario the meaning is "3d scene". When others use it, they may mean "screen space physics model". In the context of LLMs it means something like "reasoning about real-world processes outside of text".

echelon 11/13/2025||

This "world model" is Image to Gaussian Splat. This is a static render that a web-based Gaussian Splat viewer then renders.

Other "world model"s are Image + (keyboard input) to Video or Streaming Images, that effectively function like a game engine / video hybrid.

hobofan 11/12/2025||

Duplicate: https://news.ycombinator.com/item?id=45902732

theiz 11/13/2025||

Nice tech! Would be great if this can also work on factual data, like design drawings. With that it could be used for BIM and regulatory use. For example to showcase to residents how a new residential area will look that is planned. Or to test the layout of a planned airport.

girfan 11/13/2025||

This seems very interesting. Timely, given that Yann LeCun's vision also seems to align with world models being the next frontier: https://news.ycombinator.com/item?id=45897271

lofties 11/13/2025|

An established founder makes claims X is the new frontier. X receives hundreds of millions in funding. Other less established founders claim they are working on X too. VCs suffering from terminal FOMO pump billions more into X. X becomes the next frontier. The previous frontiers are promptly forgotten about.

whizzter 11/13/2025|||

I think it's a bit confusing when it comes to terminology, this seems more graphics focused while I suspect that a 10 year plan as mentioned by YLC probably revolves around re-architecting AI systems to be less reliant on LLM style nets/refinements and better understand the world in a way that isn't as prone to hallucinations.

bee_rider 11/13/2025|||

What’s it going to do, take away funds from the otherwise extremely prudent AI sector?

dcl 11/13/2025||

What happens when you prompt one of these kind of models with de_dust? Will it autocomplete the rest of the map?

edit: Just tried it and it doesn't, but it does a good job of creating something like a CS map.

padolsey 11/13/2025|

>What happens when you prompt one of these kind of models with de_dust?

Presumably de_dust2

echelon 11/13/2025||

You both should check out DiamondWM. It runs on Ubuntu and I think Windows, presuming you have an Nvidia GPU. It's exactly what you're talking about.

I linked it elsewhere in this thread.

culi 11/13/2025||

I just want to give it a picture of my house and have it show what it could look like organized so I know where to put everything

Jayakumark 11/13/2025|

Got the same use case, i tried last month with gemini app and it was awful, most rendering was messed up.

morgango 11/13/2025||

Are we nearing the capability to build something like the Mind's Game (from the Ender's Game book)

john_minsk 11/13/2025||

This is great. Can I use it with existing scan of my room to fill the gaps? Not a random world

Update - yes you can. To be tested.

john_minsk 11/13/2025|

You also can't regenerate with the new point of origin. Generate -> stop ... No way to continue?

Update - it is a paid feature

coolfox 11/13/2025||

this prompt seems to be blocked, "Los Angeles moments before a 8mile wide asteroid impacts." others work but when I use that it's always 'too busy'.

seems anything to do with asteroids (or explosions I imagine) are blocked.

venom2001viper 11/13/2025|

It doesn't seem all great researchers can create great companies or products

More comments...