Posted by Tycho87 3 days ago
>NOTE: This only works on desktop Chrome 129+ currently. Firefox and Safari are hopefully [supported soon](link), but currently even firefox nightly and safari technical preview do not work.
This is great, especially with that link! Thank you! But please say when "currently" is, e.g. add an "(Oct 2024)". Stuff like this tends to be time-sensitive on accuracy but not consistently updated and is often years out of date with no easy way for visitors to tell.
And when it's recent, it also tells people that the project is active.
> The chrome://flags/#enable-unsafe-webgpu flag must be enabled (not enable-webgpu-developer-features). Linux experimental support also requires launching the browser with --enable-features=Vulkan.
https://github.com/gpuweb/gpuweb/wiki/Implementation-Status#...
I do this quite often. I probably shouldn’t, though. It’s only useful if you’re looking at commit logs or have an inline ‘last changed by [author] on [date]’ helper in your IDE.
Then again, even that could be made wrong by future edits.
But it's a few extra steps (depending on the UI), and many will not take those steps. They'll just trust it (far beyond when it's relevant), or think "that's probably old" and doubt it (immediately, because old docs are so common).
It's relatively minor, but it's extremely easy to prevent, and just a better habit when communicating with the future.
It loaded my 50MB .ply file almost instantly. Orbiting around the scene is extremely smooth and everything is free of flickering or artifacts.
I never tried out training a Gaussian splat from images/video myself before, but this tool makes me want to give it a go.
Training a splat requires a lot less setup with this, but does still require running COLMAP(https://github.com/colmap/colmap) first, which is still a big barrier... one thing at a time!
How expensive is the COLMAP step to run? I was also really impressed with the speed in the demo (but thinking that the shown training was the only step)
Could you ELI5 what the training is versus what the COLMAP part is?
The training takes this information, to make a 3D model out it, visually matching all your photos.
COLMAP can still be quite expensive & a hassle sadly, order half hour, as opposed to seconds. There are modern alternatives like https://lpanaf.github.io/eccv24_glomap/, or even deep learning based systems like https://github.com/naver/dust3r
This is definitely still a big blocker to adoption. The goal is to get to a more all-in-one system. The splatting optimization can also help align cameras, if they don't start out entirely random, so any system to quickly provide a good "initial guess" will help here. At least for mobile devices, initialization from ARCore / ARKit poses should be enough.
Keep an eye out :)
I'm sure there are others as well
Hard at work to make performance better - the "main" kernels are at least as fast as gSplat, so now need to remove other overheads.
That, and make splatting train more efficiently in general, lots of compute is wasted on small steps.
Ps: the web version takes a minute to warm up and is generally slower, do try a native version if you haven't yet!
Above includes the explanation. Final result is here:
Not much widespread use right now - Possible commercial use cases are things like real estate walkthroughs and maybe replacing a google street view with something more interactive.
Imagine one of those house tours on Zoopla on steroids, or street view but smoother.
They can be used for video special effects, for 3D images/video, and for VR. The technology is nascent but shows promise.
Sometimes i feel like it should be able to get more details in certain areas but its always looking at things holistically.
I wish you could give it a 3d bounding box and say - "work on this area only" which i think is something that should be possible?
I think if you are reconstructing your own data the algorithm better just work, without input, ideally.
But, imagine you could add in generated videos. Lay down a camera path, tell it what to generate, and add it to the reconstruciton. A brush stroke one might say ;)
And then, what's the output?
Otherwise I find the whole website far too "involved" to understand what it's doing at all. Someone who already understands the area won't have my trouble of course.
The inputs are 1. images 2. with a pose. The usual way to get poses for your images is https://github.com/colmap/colmap.
The output is a 3D model. Specifically a "Gaussian Splat", which is a sort of fuzzy point cloud. There are some tools out there to view & edit these (besides Brush), eg. https://playcanvas.com/supersplat/editor.
Begone lidar units for basic robot tasks! All praise, normal cameras! (though, its far to slow to run on autonomous cars, since the environment changes so rapidly)
IIRC some researchers had started to back the gaussians with a mesh to provide an editable artifact that would allow the gaussians to be moved and manipulated.
Is this anywhere near being a standard feature yet?
edit - ie https://arxiv.org/abs/2402.04796