The gauge-reading example here is great, but in reality of course having the system synthesize that Python script, run the CV tasks, come back with the answer etc. is currently quite slow.
Once things go much faster, you can also start to use image generation to have models extrapolate possible futures from photos they take, and then describe them back to themselves and make decisions based on that, loops like this. I think the assumption is that our brains do similar things unconsciously, before we integrate into our conscious conception of mind.
I'm really curious what things we could build if we had 100x or 1000x inference throughput.
The planing ahead thing through simulation for example seems to be a very good tool in neuronal network based architectures.
A few robot legs and arms, big battery, off-the-shelf GPU. Solar panels.
Prompt: "Take care of all this land within its limits and grow some veggies."
Or it could turn out to look like satayoma (Japanese peasant forests) or it could be more similar to the crop rotation that was traditionally practiced in many parts of Central Africa where roots were important.
In Russia before the Soviets forced "modern scientific agriculture" on peasants to modernize, they practiced things like contour farming (where they interplanted rows of crops against the contours of the land to slow water down) and maslins (where they intermixed multiple varieties of wheat and barleys in the same patch). Now contour farming are an active area of research for their ability to prevent topsoil loss and build soil health while maslins provide superior yield stability and use little to no pesticides.
That's not even getting into the over 40-120,000 varieties of rice we've documented. Most of which are hyper adapted to a very specific location—often even a single village.
My point is there is no one way to take care of a plot of land. It's all relative to a number of factors beyond just the abiotic characteristics of the land itself. Your goals and intentions matter and you will always find localized unique adaptations.
Hyundai now owns Boston Dynamics and is pushing to get the robots into their factories.
The safety guidelines are interesting, they treat them as a goal that they are aspiring to achieve, which seems realistic. It’s not quite ready for prime time yet.
It was about Googles PaLM-E evolution and progress. It basically has two models one which controls the robot, the other is a llm and they are combined together in some attention layer.
So there might be awesome progress behind the scenes, just not ready for the general public.
That's a bit exaggerated, no? Early roombas would get tangled in socks, drag pet poop all over the floor, break glass stuff and so on, and yet the market accepted that, evolved, and now we have plenty of cleaning robots from various companies, including cheap spying ones from china.
I actually think that there's a lot of value in being the first to deploy bots into homes, even if they aren't perfect. The amount of data you'd collect is invaluable, and by the looks of it, can't be synth generated in a lab.
I think the "safer" option is still the "bring them to factories first, offices next and homes last", but anyway I'm sure someone will jump straight to home deployments.
My concern with a household robot is not the dishwasher but the tv screen, the glas door, glas table, animals (fish/aquarium) etc. the robot might walk through, touch through or fall onto.
My non-AI dishwasher can't even always keep the water inside. Nothing is perfect.
Depending on what the rate of breaking dishes is, this would be a massive improvement on me, a human being, since I break a really important dish I needed to use like ~2x per month on average.
Not here to shame you for it, for the record.
That's me ;_;
VLA models essentially take a webcam screenshot + some text (think "put the red block in the right box") and output motor control instructions to achieve that.
Note: "Gemini Robotics-ER" is not a VLA, though Gemini does have a VLA model too: "Gemini Robotics".
Nothing was reported in Google status page, not even the CLI is responding, it’s just left there waiting for an answer that will never arrive even after 10 minutes.
Anyway, cool.
Of course this is for counting animal legs while giving coordinates and reading analog clocks. Not coding or or solving puzzles. I imagine the image performance to model weight of this model is very high.
And, I was disappointed to see that pointing was just giving x,y coords. I wanted to see robots pointing at stuff.