The Waymo World Model

Posted by xnx 6 hours ago

575 points | 366 comments

mattlondon 4 hours ago|

Suddenly all this focus on world models by Deep mind starts to make sense. I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.

Google/Alphabet are so vertically integrated for AI when you think about it. Compare what they're doing - their own power generation , their own silicon, their own data centers, search Gmail YouTube Gemini workspace wallet, billions and billions of Android and Chromebook users, their ads everywhere, their browser everywhere, waymo, probably buy back Boston dynamics soon enough (they're recently partnered together), fusion research, drugs discovery.... and then look at ChatGPT's chatbot or grok's porn. Pales in comparison.

phkahler 2 hours ago||

Google has been doing more R&D and internal deployment of AI and less trying to sell it as a product. IMHO that difference in focus makes a huge difference. I used to think their early work on self-driving cars was primarily to support Street View in thier maps.

brokencode 1 hour ago|||

There was a point in time when basically every well known AI researcher worked at Google. They have been at the forefront of AI research and investing heavily for longer than anybody.

It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.

But they are in full gear now that there is real competition, and it’ll be cool to see what they release over the next few years.

hosh 1 hour ago|||

I also think the presence of Sergey Brin has been making a difference in this.

hungryhobbit 52 minutes ago|||

Please, Google was terrible about using the tech the had long before Sundar, back when Brin was in charge.

Google Reader is a simple example: Googl had by far the most popular RSS reader, and they just threw it away. A single intern could have kept the whole thing running, and Google has literal billions, but they couldn't see the value in it.

I mean, it's not like being able to see what a good portion of America is reading every day could have any value for an AI company, right?

Google has always been terrible about turning tech into (viable, maintained) products.

vinkelhake 31 minutes ago|||

Is there an equivalent to Godwin's law wrt threads about Google and Google Reader?

See also: any programming thread and Rust.

burgreblast 11 minutes ago||||

I never get the moaning about killing Reader. It was never about popularity or user experience.

Reader had to be killed because it [was seen as] a suboptimal ad monetization engine. Page views were superior.

Was Google going to support minimizing ads in any way?

DiggyJohnson 13 minutes ago||||

How is this relevant? At best it’s tangentially related and low effort

jamespo 30 minutes ago|||

Took a while but I got to the google reader post. Self host tt-rss, it's much better

refulgentis 1 hour ago|||

Ex-googler: I doubt it, but am curious for rationale (i know there was a round of PR re: him “coming back to help with AI.” but just between you and me, the word on him internally, over years and multiple projects, was having him around caused chaos b/c he was a tourist flitting between teams, just spitting out ideas, but now you have unclear direction and multiple teams hearing the same “you should” and doing it)

pstuart 41 minutes ago|||

That makes sense. A "secret shopper" might be a better way to avoid that but wouldn't give him the strokes of being the god in the room.

LightBug1 21 minutes ago|||

Oh ffs, we have an external investor who behaves like that. Literally set us back a year on pet nonsense projects and ideas.

smallnix 41 minutes ago|||

> It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.

I always thought they deliberately tried to contain the genie in the bottle as long as they could

AlfredBarnes 1 hour ago||||

It has always felt to me that the LLM chatbots were a surprise to Google, not LLMs, or machine learning in general.

raphlinus 1 hour ago||

Not true at all. I interacted with Meena[1] while I was there, and the publication was almost three years before the release of ChatGPT. It was an unsettling experience, felt very science fiction.

[1]: https://research.google/blog/towards-a-conversational-agent-...

hibikir 50 minutes ago|||

The surprise was not that they existed: There were chatbots in Google way before ChatGPT. What surprised them was the demand, despite all the problems the chatbots have. The pig problem with LLMs was not that they could do nothing, but how to turn them into products that made good money. Even people in openAI were surprised about what happened.

In many ways, turning tech into products that are useful, good, and don't make life hell is a more interesting issue of our times than the core research itself. We probably want to avoid the valuing capturing platform problem, as otherwise we'll end up seeing governments using ham fisted tools to punish winners in ways that aren't helpful either

diamondage 14 minutes ago||

The uptake forced the bigger companies to act. With image diffusion models too - no corporate lawyer would let a big company release a product that allowed the customer to create any image...but when stable diffusion et al started to grow like they did...there was a specific price of not acting...and it was high enough to change boardroom decisions

nasretdinov 1 hour ago|||

Well, I must say ChatGPT felt much more stable than Meena when I first tried it. But, as you said, it was a few years before ChatGPT was publicly announced :)

AbstractH24 1 hour ago|||

Google and OpenAI are both taking very big gambles with AI, with an eye towards 2036 not 2026. As are many others, but them in particular.

It'll be interesting to see which pays off and which becomes Quibi

schiffern 17 minutes ago|||

  >I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.

So for the record, that's 3+ years behind Tesla.

https://www.youtube.com/watch?v=ODSJsviD_SU&t=3594s

tapoxi 7 minutes ago||

Aren't they still using safety drivers or safety follow cars and in fewer cities? Seems Tesla is pretty far behind.

schiffern 1 minute ago||

What do you think I said that you're contradicting?

IMO the presence of safety drivers is just a sensible "as low as reasonably achievable" measure during the early rollout. I'm not sure that can be used as a point against them. I'm comfortably with Tesla "sparing no expense" for safety, since I think we all (including Tesla) understand that this isn't the ultimate implementation.

xnx 4 hours ago|||

> Suddenly all this focus on world models by Deep mind starts to make sense

Google's been thinking about world models since at least 2018: https://arxiv.org/abs/1803.10122

anp 2 hours ago||

FWIW I understood GP to mean that it suddenly makes sense to them, not that there’s been a sudden focus shift at google.

mooktakim 4 hours ago|||

Tesla built something like this for FSD training, they presented many years ago. I never understood why they did productize it. It would have made a brilliant Maps alternative, which country automatically update from Tesla cars on the road. Could live update with speed cameras and road conditions. Like many things they've fallen behind

berryg 3 hours ago|||

No Lidar anymore on the 2026 Volvo models ES60 and EX60. See for example: https://www.jalopnik.com/2032555/volvo-ends-luminar-lidar-20...

senordevnyc 2 hours ago||

I love Volvo, am considering buying one in a couple weeks actually, but they're doing nothing interesting in terms of ADAS, as far as I can tell. It seems like they're limited to adaptive cruise control and lane keeping, both of which have been solved problems for more than a decade.

It sounds like they removed Lidar due to supplier issues and availability, not because they're trying to build self-driving cars and have determined they don't need it anymore.

ruszki 1 hour ago|||

Is lane keeping really a solved problem? Just last year one of my brand new rented cars tried to kill me a few times when I tried it again, and so far not even the simple lane leaving detection mechanism worked properly in any of the tried cars when it was raining.

nfg 1 hour ago||||

I’d suggest doing some research on software quality. Two years back I was all for buying one (I was considering an EX40), but I got myself into some Facebook groups for owners and was shocked at the dreadful reports of quality of the software and it completely put me off. I got an ID4 instead. Reports about the EX90 have been dreadful. I was very interested, and I still admire their look and build when they drive by - but it killed my enthusiasm to buy one for a few years until they get it right.

fuckyah 2 hours ago|||

[dead]

jellojello 4 hours ago|||

Without Lidar + the terrible quality of tesla onboard cameras.. street view would look terrible. The biggest L of elon's career is the weird commitment to no-lidar. If you've ever driven a Tesla, it gives daily messages "the left side camera is blocked" etc.. cameras+weather don't mix either.

ASalazarMX 3 hours ago|||

At first I gave him the benefit of the doubt, like that weird decision of Steve Jobs banning Adobe Flash, which ran most of the fun parts of the Internet back then, that ended up spreading HTML5. Now I just think he refused LIDAR on purely aesthetic reasons. The cost is not even that significant compared to the overall cost of a Tesla.

ciberado 56 minutes ago|||

That one was motivated by the need of controlling the app distribution channel, just like they keep the web as a second class citizen in their ecosystem nowadays.

iamtheworstdev 3 hours ago||||

he didn't refuse it. MobileEye or whoever cut Tesla off because they were using the lidar sensors in a way he didn't approve. From there he got mad and said "no more lidar!"

semiquaver 1 hour ago|||

Assuming what you say is true, are they the only LIDAR vendor?

iknowstuff 3 hours ago|||

False. Mobileye never used lidar. Lmao where do you all come up with this

nerdsniper 2 hours ago|||

I think Elon announced Tesla was ditching LIDAR in 2019.[0] This was before Mobileye offered LIDAR. Mobileye has used LIDAR from Luminar Technologies around 2022-2025. [1][2] They were developing their own lidar, but cancelled it. [3] They chose Innoviz Technologies as their LIDAR partner going forward for future product lines. [4]

0: https://techcrunch.com/2019/04/22/anyone-relying-on-lidar-is...

1: https://static.mobileye.com/website/corporate/media/radar-li...

2: https://www.luminartech.com/updates/luminar-accelerates-comm...

3: https://www.youtube.com/watch?v=Vvg9heQObyQ&t=48s

4: https://ir.innoviz.tech/news-events/press-releases/detail/13...

Fricken 18 minutes ago|||

The original Mobileye EyeQ3 devices that Tesla began installing in their cars in 2013 had only a single forward facing camera. They were very simple devices, only intended to be used for lane keeping. Tesla hacked the devices and pushed them beyond their safe design constraints.

Then that guy got decapitated when his Model S drove under a semi-truck that was crossing the highway and Mobileye terminated the contract. Weirdly, the same fatal edge case occurred 2 more times at least on Tesla's newer hardware.

https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashe...

iknowstuff 2 hours ago|||

Never with the product used by Tesla early on.

agildehaus 3 hours ago|||

https://www.mobileye.com/news/mobileye-to-end-internal-lidar...

Um, yes they did.

No idea if it had any relation to Tesla though.

iknowstuff 2 hours ago||

Did not

smallmancontrov 3 hours ago||||

His stated reason was that he wanted the team focused on the driving problem, not sensor fusion "now you have two problems" problems. People assumed cost was the real reason, but it seems unfair to blame him for what people assumed. Don't get me wrong, I don't like him either, but that's not due to his autonomous driving leadership decisions, it's because of shitting up twitter, shitting up US elections with handouts, shitting up the US government with DOGE, seeking Epstein's "wildest party," DARVO every day, and so much more.

jellojello 3 hours ago||

Sensor fusion is an issue, one that is solvable over time and investment in the driving model, but sensor-can't-see-anything is a show stopper.

Having a self-driving solution that can be totally turned off with a speck of mud, heavy rain, morning dew, bright sunlight at dawn and dusk.. you can't engineer your way out of sensor-blindness.

I don't want a solution that is available to use 98% of the time, I want a solution that is always-available and can't be blinded by a bad lighting condition.

I think he did it because his solution always used the crutch of "FSD Not Available, Right hand Camera is Blocked" messaging and "Driver Supervision" as the backstop to any failure anywhere in the stack. Waymo had no choice but to solve the expensive problem of "Always Available and Safe" and work backwards on price.

smallmancontrov 2 hours ago||

LIDAR is notoriously easy to blind, what are you on about? Bonus meme: LIDAR blinds you(r iPhone camera)!

jellojello 3 hours ago|||

[dead]

verelo 3 hours ago||||

Yeah its absurd. As a Tesla driver, I have to say the autopilot model really does feel like what someone who's never driven a car before thinks it's like.

Using vision only is so ignorant of what driving is all about: sound, vibration, vision, heat, cold...these are all clues on road condition. If the car isn't feeling all these things as part of the model, you're handicapping it. In a brilliant way Lidar is the missing piece of information a car needs without relying on multiple sensors, it's probably superior to what a human can do, where as vision only is clearly inferior.

smallmancontrov 3 hours ago|||

The inputs to FSD are:

    7 cameras x 36fps x 5Mpx x 30s
    48kHz audio
    Nav maps and route for next few miles
    100Hz kinematics (speed, IMU, odometry, etc)

Source: https://youtu.be/LFh9GAzHg1c?t=571

ambicapter 3 hours ago|||

So if they’re already “fusioning” all these things, why would LIDAR be any different?

smallmancontrov 2 hours ago||

Tesla went nothing-but-nets (making fusion easy) and Chinese LIDAR became cheap around 2023, but monocular depth estimation was spectacularly good by 2021. By the time unit cost and integration effort came down, LIDAR had very little to offer a vision stack that no longer struggled to perceive the 3D world around it.

Also, integration effort went down but it never disappeared. Meanwhile, opportunity cost skyrocketed when vision started working. Which layers would you carve resources away from to make room? How far back would you be willing to send the training + validation schedule to accommodate the change? If you saw your vision-only stack take off and blow past human performance on the march of 9s, would you land the plane just because red paint became available and you wanted to paint it red?

I wouldn't completely discount ego either, but IMO there's more ego in the "LIDAR is necessary" case than the "LIDAR isn't necessary" at this point. FWIW, I used to be an outspoken LIDAR-head before 2021 when monocular depth estimation became a solved problem. It was funny watching everyone around me convert in the opposite direction at around the same time, probably driven by politics. I get it, I hate Elon's politics too, I just try very hard to keep his shitty behavior from influencing my opinions on machine learning.

magicalist 1 hour ago|||

> but monocular depth estimation was spectacularly good by 2021

It's still rather weak and true monocular depth estimation really wasn't spectacularly anything in 2021. It's fundamentally ill posed and any priors you use to get around that will come to bite you in the long tail of things some driver will encounter on the road.

The way it got good is by using camera overlap in space and over time while in motion to figure out metric depth over the entire image. Which is, humorously enough, sensor fusion.

smallmancontrov 18 minutes ago||

It was spectacularly good before 2021, 2021 is just when I noticed that it had become spectacularly good. 7.5 billion miles later, this appears to have been the correct call.

kanbara 1 hour ago||||

depth estimation is but one part of the problem— atmospheric and other conditions which blind optical visible spectrum sensors, lack of ambient (sunlight) and more. lidar simply outperforms (performs at all?) in these conditions. and provides hardware back distance maps, not software calculated estimation

gibolt 1 hour ago||

Lidar fails worse than cameras in nearly all those conditions. There are plenty of videos of Tesla's vision-only approach seeing obstacles far before a human possibly could in all those conditions on real customer cars. Many are on the old hardware with far worse cameras

kranke155 1 hour ago||||

Always thought the case was for sensor redundancy and data variety - the stuff that throws off monocular depth estimation might not throw off a lidar or radar.

7e 1 hour ago|||

Monocular depth estimation can be fooled by adversarial images, or just scenes outside of its distribution. It's a validation nightmare and a joke for high reliability.

gibolt 1 hour ago||

It isn't monocular though. A Tesla has 2 front-facing cameras, narrow and wide-angle. Beyond that, it is only neural nets at this point, so depth estimation isn't directly used; it is likely part of the neural net, but only the useful distilled elements.

verelo 3 hours ago||||

Better than I expected. So this was 3 days ago, is this for all previously models or is there a cut off date here?

ChicagoDave 2 hours ago|||

Fog, heavy rain, heavy snow, people running between cars or from an obstructed view…

None of these technologies can ever be 100%, so we’re basically accepting a level of needless death.

Musk has even shrugged off FSD related deaths as, “progress”.

smallmancontrov 2 hours ago||

Humans: 70 deaths in 7 billion miles

FSD: 2 deaths in 7 billion miles

Looks like FSD saves lives by a margin so fat it can probably survive most statistical games.

elgenie 32 minutes ago|||

Isn't there a great deal of gaming going on with the car disengaging FSD milliseconds before crashing? Voila, no "full" "self" driving accident; just another human failing [*]!

[*] Failing to solve the impossible situation FSD dropped them into, that is.

smallmancontrov 11 minutes ago||

Nope. NHTSA's criteria for reporting is active-within-30-seconds.

https://www.nhtsa.gov/laws-regulations/standing-general-orde...

If there's gamesmanship going on, I'd expect the antifan site linked below to have different numbers, but it agrees with the 2 deaths figure for FSD.

hn_acc1 1 hour ago|||

Is that the official Tesla stat? I've heard of way more Tesla fatalities than that..

ChicagoDave 1 hour ago|||

This is absolutely a Musk defender. FSD and Tesla related deaths are much higher.

https://www.tesladeaths.com/index-amp.html

smallmancontrov 16 minutes ago||

Your own site agrees with me:

> 2 fatalities involving the use of FSD

Fricken 16 minutes ago|||

I don't know what he's on about. Here's a better list:

https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashe...

smallmancontrov 9 minutes ago||

Autopilot is the shitty lane assist. FSD is the SOTA neural net. The page you linked agrees with the 2 FSD deaths figure:

> two that NHTSA's Office of Defect Investigations determined as happening during the engagement of Full Self-Driving (FSD) after 2022.

torginus 1 hour ago||||

I quickly googled Lidar limitations, and this article came up:

https://www.yellowscan.com/knowledge/how-weather-really-affe...

Seeing how its by a lidar vendor, I don't think they're biased against it. It seems Lidar is not a panacea - it struggles with heavy rain, snow, much more than cameras do and is affected by cold weather or any contamination on the sensor.

So lidar will only get you so far. I'm far more interested in mmwave radar, which while much worse in spatial resolution, isn't affected by light conditions, weather, can directly measure stuff on the thing its illuminating, like material properties, the speed its moving, the thickness.

Fun fact: mmWave based presence sensors can measure your hearbeat, as the micro-movements show up as a frequency component. So I'd guess it would have a very good chance to detect a human.

I'm pretty sure even with much more rudimentary processing, it'll be able to tell if its looking at a living being.

By the way: what happened to the idea that self-driving cars will be able to talk to each other and combine each other's sensor data, so if there are multiple ones looking at the same spot, you'd get a much improved chance of not making a mistake.

ASalazarMX 3 hours ago||||

Maybe vision-only can work with much better cameras, with a wider spectrum (so they can see thru fog, for example), and self-cleaning/zero upkeep (so you don't have to pull over to wipe a speck of mud from them). Nevertheless, LIDAR still seems like the best choice overall.

iknowstuff 3 hours ago|||

Autopilot hasn’t been updated in years and is nothing like FSD. FSD does use all of those cues.

verelo 3 hours ago||

I misspoke, i'm using Hardware 3 FSD.

0xfaded 3 hours ago||||

I have HW3, but FSD reliably disengages at this time of year with sunrise and sunset during commute hours.

jellojello 3 hours ago|||

Yep, and won't activate until any morning dew is off the sensors.. or when it rains too hard.. or if it's blinded by a shiny building/window/vehicle.

I will never trust 2d camera-only, it can be covered or blocked physically and when it happens FSD fails.

As cheap as LIDAR has gotten, adding it to every new tesla seems to be the best way out of this idiotic position. Sadly I think Elon got bored with cars and moved on.

iknowstuff 3 hours ago|||

FSD14 on hw4 does not. Its dynamic range is equivalent or better than human.

kypro 2 hours ago||||

From the perspective of viewing FSD as an engineering problem that needs solving I tend to think Elon is on to something with the camera-only approach – although I would agree the current hardware has problems with weather, etc.

The issue with lidar is that many of the difficult edge-cases of FSD are all visible-light vision problems. Lidar might be able to tell you there's a car up front, but it can't tell you that the car has it's hazard lights on and a flat tire. Lidar might see a human shaped thing in the road, but it cannot tell whether it's a mannequin leaning against a bin or a human about to cross the road.

Lidar gets you most of the way there when it comes to spatial awareness on the road, but you need cameras for most of the edge-cases because cameras provide the color data needed to understand the world.

You could never have FSD with just lidar, but you could have FSD with just cameras if you can overcome all of the hardware and software challenges with accurate 3D perception.

Given Lidar adds cost and complexity, and most edge cases in FSD are camera problems, I think camera-only probably helps to force engineers to focus their efforts in the right place rather than hitting bottlenecks from over depending on Lidar data. This isn't an argument for camera-only FSD, but from Tesla's perspective it does down costs and allows them to continue to produce appealing cars – which is obviously important if you're coming at FSD from the perspective of an auto marker trying to sell cars.

Finally, adding lidar as a redundancy once you've "solved" FSD with cameras isn't impossible. I personally suspect Tesla will eventually do this with their robotaxis.

That said, I have no real experience with self-driving cars. I've only worked on vision problems and while lidar is great if you need to measure distances and not hit things, it's the wrong tool if you need to comprehend the world around you.

senordevnyc 2 hours ago||

This is so wild to read when Waymo is currently doing like 500,000 paid rides every week, all over the country, with no one in the driver's seat. Meanwhile Tesla seems to have a handful of robotaxis in Austin, and it's unclear if any of them are actually driverless.

But the Tesla engineers are "in the right place rather than hitting bottlenecks from over depending on Lidar data"? What?

smallmancontrov 2 hours ago|||

Tesla has driven 7.5B autonomous miles to Waymo's 0.2B, but yes, Waymo looks like they are ahead when you stratify the statistics according to the ass-in-driver-seat variable and neglect the stratum that makes Tesla look good.

The real question is whether doing so is smart or dumb. Is Tesla hiding big show-stopper problems that will prevent them from scaling without a safety driver? Or are the big safety problems solved and they are just finishing the Robotaxi assembly line that will crank out more vertically-integrated purpose-designed cars than Waymo's entire fleet every day before lunch?

hn_acc1 1 hour ago|||

Tesla's also been involved in WAY more accidents than Waymo - and has tried to silence those people, claim FSD wasn't active, etc.

What good is a huge fleet of Robotaxis if no one will trust them? I won't ever set foot in a Robotaxi, as long as Elon is involved.

kypro 47 minutes ago||

There's more Tesla's on the road than Waymo's by several orders of magnitude. Additionally the types of roads and conditions Tesla's drive under is completely incomparable to Waymo.

jamespo 22 minutes ago||

Yes that was accounted for above, but this isn't autonomous apples to apples

jasondigitized 32 minutes ago|||

semi autonomous

kypro 55 minutes ago|||

I wasn't arguing Tesla is ahead of Waymo? Nor do I think they are. All I was arguing was that it makes sense from the perspective of a consumer automobile maker to not use lidar.

I don't think Tesla is that far behind Waymo though given Waymo has had a significant head start, the fact Waymo has always been a taxi-first product, and given they're using significantly more expensive tech than Tesla is.

Additionally, it's not like this is a lidar vs cameras debate. Waymo also uses and needs cameras for FSD for the reasons I mentioned, but they supplement their robotaxis with lidar for accuracy and redundancy.

My guess is that Tesla will experiment with lidar on their robotaxis this year because design decisions should differ from those of a consumer automobile. But I could be wrong because if Tesla wants FSD to work well on visually appealing and affordable consumer vehicles then they'll probably have to solve some of the additional challenges with with a camera-only FSD system. I think it will depend on how much Elon decides Tesla needs to pivot into robotaxis.

Either way, what is undebatable is that you can't drive with lidar only. If the weather is so bad that cameras are useless then Waymos are also useless.

gambiting 2 hours ago|||

>>The biggest L of elon's career is the weird commitment to no-lidar.

I thought it was the Nazi salutes on stage and backing neo-nazi groups everywhere around the world, but you know, I guess the lidar thing too.

smeeth 4 hours ago|||

I always understood this to be why Tesla started working on humanoid robots

ACCount37 3 hours ago|||

Pretty much. They banked on "if we can solve FSD, we can partially solve humanoid robot autonomy, because both are robots operating in poorly structured real world environments".

jasondigitized 32 minutes ago||||

I don't want a humanoid robot. I want a purpose built robot.

Fricken 14 minutes ago||||

It's so they can stick a Tesla logo on a bunch of chinese tech and call it innovation.

smt88 2 hours ago|||

They started working on humanoid robots because Musk always has to have the next moonshot, trillion-dollar idea to promise "in 3 years" to keep the stock price high.

As soon as Waymo's massive robotaxi lead became undeniable, he pivoted to from robotaxis to humanoid robots.

senordevnyc 2 hours ago||

Yeah, that and running Grok on a trillion GPUs in space lol

Dig1t 53 minutes ago|||

>or grok's porn

I know it’s gross, but I would not discount this. Remember why Blu-ray won over HDDVD? I know it won for many other technical reasons, but I think there are a few historical examples of sexual content being a big competitive advantage.

dmd 3 hours ago|||

Which is why it's embarrassing how much worse Gemini is at searching the web for grounding information, and how incredibly bad gemini cli is.

xnx 1 hour ago||

Not my experience in either of those areas.

coffeemug 3 hours ago|||

The vertical integration argument should apply to Grok. They have Tesla driving data (probably much more data than Waymo), Twitter data, plus Tesla/SpaceX manufacturing data. When/if Optimus starts on the production line, they'll have that data too. You could argue they haven't figured out how to take advantage of it, but the potential is definitely there.

BoredPositron 3 hours ago|||

Agreed. Should they achieve Google level integration, we will all make sure they are featured in our commentary. Their true potential is surely just around the corner...

jeffbee 2 hours ago|||

"Tesla has more data than Waymo" is some of the lamest cope ever. Tesla does not have more video than Google! That's crazy! People who repeat this are crazy! If there was a massive flow of video from Tesla cars to Tesla HQ that would have observable side effects.

thefounder 3 hours ago|||

But somehow google fails to execute. Gemini is useless for programming and I don’t think even bother to use it as chat app. Claude code + gpt 5.2 xhigh for coding and gpt as chat app are really the only ones that are worth it(price and time wise)

coffeemug 2 hours ago|||

I've recently switched to Claude for chat. GPT 5.2 feels very engagement-maxxed for me, like I'm reading a bad LinkedIn post. Claude does a tiny bit of this too, but an order of magnitude less in my experience. I never thought I'd switch from ChatGPT, but there is only so much "here's the brutal truth, it's not x it's y" I can take.

thechao 2 hours ago|||

GPT likes to argue, and most of its arguments are straw man arguments, usually conflating priors. It's ... exhausting; akin to arguing on the internet. (What am I even saying, here!?) Claude's a lot less of that. I don't know if tracks discussion/conversation better; but, for damn sure, it's got way less verbal diarrhea than GPT.

mrlongroots 2 hours ago||

Yes, GPT5-series thinking models are extremely pedantic and tedious. Any conversation with them is derailed because they start nitpicking something random.

But Codex/5.2 was substantially more effective than Claude at debugging complex C++ bugs until around Fall, when I was writing a lot more code.

I find Gemini 3 useless. It has regressed on hallucinations from Gemini 2.5, to the point where its output is no better than a random token stream despite all its benchmark outperformance. I would use Gemini 2.5 to help write papers and all, can't see to use Gemini 3 for anything. Gemini CLI also is very non-compliant and crazy.

thefounder 1 hour ago||||

To me ChatGPT seems smarter and knows more. That’s why I use it. Even Claude rates gpt better for knowledge answers. Not sure if that itself is any indication. Claude seems superficial unless you hammer it to generate a good answer.

aschla 2 hours ago|||

Experiencing the same. It seems Anthropic’s human-focused design choices are becoming a differentiator.

henryfjordan 2 hours ago||||

Gemini works well enough in Search and in Meet. And it's baked into the products so it's dead simple to use.

I don't think Google is targeting developers with their AI, they are targeting their product's users.

unsupp0rted 1 hour ago||||

Gemini is by far the best UI/UX designer model. Codex seems to the worst: it'll build something awkward and ugly, then Gemini will take 30-60 seconds to make it look like something that would have won a design award a couple years ago.

noelsusman 2 hours ago||||

It is a bit mind boggling how behind they were considering they invented transformers and were also sitting on the best set of training data in the world, but they've caught up quite a bit. They still lag behind in coding, but I've found Gemini to be pretty good at more general knowledge tasks. Flash 3 in particular is much better than anything of comparable price and speed from OpenAI or Anthropic.

spiderfarmer 1 hour ago|||

Grok/xAI is a joke at this point. A true money pit without any hopes for a serious revenue stream.

They should be bought by a rocket company. Then they would stand a chance.

jasondigitized 2 hours ago|||

The flywheel is starting to spin......

uoaei 1 hour ago|||

What an upsetting comment. I'm glad you came around but what did you think was going to be effective before you came around to world models?

themafia 3 hours ago|||

It's a 3500lb robot that can kill you.

Boston Robotics is working on a smaller robot that can kill you.

Anduril is working on even smaller robots that can kill you.

The future sucks.

zzzeek 3 hours ago||

and they're all controlled by (poorly compensated) humans anyway [1] [2]

[1] https://www.wsj.com/tech/personal-tech/i-tried-the-robot-tha...

[2] https://futurism.com/advanced-transport/waymos-controlled-wo...

themafia 1 hour ago||

They couldn't even make burger flipping robots work and are paying fast food workers $20/hr in California.

If that doesn't make it obvious what they can and cannot do then I can't respect the tranche of "hackers" who blindly cheer on this unchecked corporate dystopian nightmare.

sdf2erf 3 hours ago||

"Waymo as a robot in the same way"

Erm, a dishwasher, washing machine, automated vacuum can be considered robots. Im confused as to this obsession of the term - there are many robots that already exist. Robotics have been involved in the production of cars for decades.

......

ASalazarMX 3 hours ago|||

I think the (gray) line is the degree of autonomy. My washing machine makes very small, predictable decisions, while a Waymo has to manage uncertainty most of the time.

sdf2erf 3 hours ago||

Its irrelevant. A robot is a robot.

Dictionary def: "a machine controlled by a computer that is used to perform jobs automatically."

mattlondon 3 hours ago|||

No one is denying that robots existed already (but I would hardly call a dishwasher a robot FWIW)

But in my mind a waymo was always a "car with sensors", but more recently (especially having recently used them a bunch in California recently) I've come to think of them truly as robots.

saghm 2 hours ago||||

A robot is a robot, and a human is a creature that won't necessarily agree with another human on what the definition of a word is. Dictionaries are also written by humans and don't necessarily reflect the current consensus, especially on terms where people's understanding might evolve over time as technology changes.

Even if that definition were universally agreed on l upon though, that's not really enough to understand what the parent comment was saying. Being a robot "in the same way" as something else is even less objective. Humans are humans, but they're also mammals; is a human a mammal "in the same way" as a mouse? Most humans probably have a very different view of the world than most mice, and the parent comment was specifically addressing the question of whether it makes sense for an autonomous car to model the world the same way as other robots or not. I don't see how you can dismiss this as "irrelevant" because both humans and mice are mammals (or even animals; there's no shortage of classifications out there) unless you're completely having a different conversation than the person you responded to. You're not necessarily wrong because of that, but you're making a pretty significant misjudgment if you think that's helpful to them or to anyone else involved in the ongoing conversation.

ASalazarMX 3 hours ago||||

TIL fuel injectors are robots. Probably my ceiling lights too.

Maybe we need to nitpick about what a job is exactly? Or we could agree to call Waymos (semi)autonomous robots?

goatlover 3 hours ago|||

In the same way people online have argued helicopters are flying cars, it doesn't capture what most people mean when they use the word "robot", anymore than helicopters are what people have in mind when they mention flying cars.

yummypaint 36 minutes ago||

By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.

How do you know the generated outputs are correct? Especially for unusual circumstances?

Say the scenario is a patch of road is densely covered with 5 mm ball bearings. I'm sure the model will happily spit out numbers, but are they reasonable? How do we know they are reasonable? Even if the prediction is ok, how do we fundamentally know that the prediction for 4 mm ball bearings won't be completely wrong?

There seems to be a lot of critical information missing.

IMTDb 11 minutes ago||

The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.

For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.

We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.

joshfee 27 minutes ago|||

Isn't that true for any scenario previously unencountered, whether it is a digital simulation or a human? We can't optimize for the best possible outcome in reality (since we can't predict the future), but we can optimize for making the best decisions given our knowledge of the world (even if it is imperfect).

In other words it is a gradient from "my current prediction" to "best prediction given my imperfect knowledge" to "best prediction with perfect knowledge", and you can improve the outcome by shrinking the gap between 1&2 or shrinking the gap between 2&3 (or both)

ses1984 14 minutes ago|||

You could train it in simulation and then test it in reality.

inkysigma 12 minutes ago||

Would it actually be a good idea to operate a car near an active tornado?

fooker 29 minutes ago|||

> from a tornado to a casual encounter with an elephant

A sims style game with this technology will be pretty nice!

aaaalone 33 minutes ago||

They probably just look at the results of the generation.

I mean would I like a in-depth tour of this? Yes.

But it's a marketing blog article, what do you expect?

parliament32 26 minutes ago||

> just look at the results of the generation

And? The entire hallucination problem with text generators is "plausible sounding yet incorrect", so how does a human eyeballing it help at all?

inkysigma 6 minutes ago||

I think because here there's no single correct answer that the model is allowed to be fuzzier. You still mix in real training data and maybe more physics based simulation of course but it does seem acceptable that you synthesize extremely tail evaluations since there isn't really a "better" way by definition and you can evaluate the end driving behavior after training.

You can also probably still use it for some kinds of evaluation as well since you can detect if two point clouds intersect presumably.

In much a similar way that LLMs are not perfect at translation but are widely used anyway for NMT.

xnx 6 hours ago||

> The Waymo World Model can convert those kinds of videos, or any taken with a regular camera, into a multimodal simulation—showing how the Waymo Driver would see that exact scene.

Subtle brag that Waymo could drive in camera-only mode if they chose to. They've stated as much previously, but that doesn't seem widely known.

bonsai_spool 5 hours ago||

I think I'm misunderstanding - they're converting video into their representation which was bootstrapped with LIDAR, video and other sensors. I feel you're alluding to Tesla, but Tesla could never have this outcome since they never had a LIDAR phase.

(edit - I'm referring to deployed Tesla vehicles, I don't know what their research fleet comprises, but other commenters explain that this fleet does collect LIDAR)

smallmancontrov 5 hours ago|||

They can and they do.

https://youtu.be/LFh9GAzHg1c?t=872

They've also built it into a full neural simulator.

https://youtu.be/LFh9GAzHg1c?t=1063

I think what we are seeing is that they both converged on the correct approach, one of them decided to talk about it, and it triggered disclosure all around since nobody wants to be seen as lagging.

tfehring 4 hours ago|||

I watched that video around both timestamps and didn't see or hear any mention of LIDAR, only of video.

smallmancontrov 4 hours ago||

Exactly: they convert video into a world model representation suitable for 3D exploration and simulation without using LIDAR (except perhaps for scale calibration).

tfehring 4 hours ago||

My mistake - I misinterpreted your comment, but after re-reading more carefully, it's clear that the video confirms exactly what you said.

IhateAI_3 3 hours ago|||

tesla is not impressive, I would never put my child in one

yakz 5 hours ago|||

Tesla does collect LIDAR data (people have seen them doing it, it's just not on all of the cars) and they do generate depth maps from sensor data, but from the examples I've seen it is much lower resolution than these Waymo examples.

justapassenger 5 hours ago||

Tesla does it to map the areas to come up with high def maps for areas where their cars try to operate.

vardump 5 hours ago||

Tesla uses lidar to train their models to generate depth data out of camera input. I don’t think they have any high definition maps.

ActorNightly 5 hours ago|||

The purpose of lidar is to prove error correction when you need it most in terms of camera accuracy loss.

Humans do this, just in the sense of depth perception with both eyes.

robotresearcher 2 hours ago|||

Human depth perception uses stereo out to only about 2 or 3 meters, after which the distance between your eyes is not a useful baseline. Beyond 3m we use context clues and depth from motion when available.

aylons 2 hours ago|||

Thanks, saved some work.

And I'll add that it in practice it is not even that much unless you're doing some serious training, like a professional athlete. For most tasks, the accurate depth perception from this fades around the length of the arms.

cyanydeez 2 hours ago|||

ok, but a care is a few meters wide, isn't that enough for driving depth perception similar to humans

robotresearcher 2 hours ago||

The depths you are trying to estimate are to the other cars, people, turnings, obstacles, etc. Could be 100m away or more on the highway.

cyanydeez 5 minutes ago||

ok, but the point trying to be made is based on human's depth perception, but a car's basic limitation is the width of the vehicle, so there's missing information if you're trying to figure out if a car can use cameras to do what human eyes/brains do.

dbt00 5 hours ago||||

(Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance, which is why so many people get simulator sickness from stereoscopic 3d VR)

wolrah 4 hours ago|||

> Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance

Also subtle head and eye movements, which is something a lot of people like to ignore when discussing camera-based autonomy. Your eyes are always moving around which changes the perspective and gives a much better view of depth as we observe parallax effects. If you need a better view in a given direction you can turn or move your head. Fixed cameras mounted to a car's windshield can't do either of those things, so you need many more of them at higher resolutions to even come close to the amount of data the human eye can gather.

CobrastanJorji 2 hours ago||||

I keep wondering about the focal depth problem. It feels potentially solvable, but I have no idea how. I keep wondering if it could be as simple as a Magic Eye Autostereogram sort of thing, but I don't think that's it.

There have been a few attempts at solving this, but I assume that for some optical reason actual lenses need to be adjusted and it can't just be a change in the image? Meta had "Varifocal HMDs" being shown off for a bit, which I think literally moved the screen back and forth. There were a couple of "Multifocal" attempts with multiple stacked displays, but that seemed crazy. Computer Generated Holography sounded very promising, but I don't know if a good one has ever been built. A startup called Creal claimed to be able to use "digital light fields", which basically project stuff right onto the retina, which sounds kinda hogwashy to me but maybe it works?

mikepurvis 4 hours ago||||

My understanding is that contextual clues are a big part of it too. We see a the pitcher wind up and throw a baseball as us more than we stereoscopically track its progress from the mound to the plate.

More subtly, a lot of depth information comes from how big we expect things to be, since everyday life is full of things we intuitively know the sizes of, frames of reference in the form of people, vehicles, furniture, etc . This is why the forced perspective of theme park castles is so effective— our brains want to see those upper windows as full sized, so we see the thing as 2-3x bigger than it actually is. And in the other direction, a lot of buildings in Las Vegas are further away than they look because hotels like the Bellagio have large black boxes on them that group a 2x2 block of the actual room windows.

kevindamm 4 hours ago|||

Actually the reason people experience vection in VR is not focal depth but the dissonance between what their eyes are telling them and what their inner ear and tactile senses are telling them.

It's possible they get headaches from the focal length issues but that's different.

pants2 5 hours ago||||

Another way humans perceive depth is by moving our heads and perceiving parallax.

menaerus 5 hours ago||||

How expensive is their lidar system?

hangonhn 5 hours ago|||

Hesai has driven the cost into the $200 to 400 range now. That said I don't know what they cost for the ones needed for driving. Either way we've gone from thousands or tens of thousands into the hundreds dollar range now.

bragr 4 hours ago||

Looking at prices, I think you are wrong and automotive Lidar is still in the 4 to 5 figure range. HESAI might ship Lidar units that cheap, but automotive grade still seems quite expensive: https://www.cratustech.com/shop/lidar/

tzs 3 hours ago|||

Those are single unit prices. The AT128 for instance, which is listed at $6250 there and widely used by several Chinese car companies was around $900 per unit in high volume and over time they lowered that to around $400.

The next generation of that, the ATX, is the one they have said would be half that cost. According to regulator filings in China BYD will be using this on entry level $10k cars.

Hesai got the price down for their new generation by several optimizations. They are using their own designs for lasers, receivers, and driver chips which reduced component counts and material costs. They have stepped up production to 1.5 million units a year giving them mass production efficiencies.

bragr 2 hours ago||

That model only has a 120 degree field of view so you'd need 3-4 of them per car (plus others for blind spots, they sell units for that too). That puts the total system cost in the low thousands, not the 200 to 400 stated by GP. I'm not saying it hasn't gotten cheaper or won't keep getting cheaper, it just doesn't seem that cheap yet.

jellojello 4 hours ago|||

[dead]

jmux 4 hours ago||||

Waymo does their LiDAR in-house, so unfortunately we don’t know the specs or the cost

nerdsniper 4 hours ago|||

Otto and Uber and the CEO of https://pronto.ai do though (tongue-in-cheek)

> Then, in December 2016, Waymo received evidence suggesting that Otto and Uber were actually using Waymo’s trade secrets and patented LiDAR designs. On December 13, Waymo received an email from one of its LiDAR-component vendors. The email, which a Waymo employee was copied on, was titled OTTO FILES and its recipients included an email alias indicating that the thread was a discussion among members of the vendor’s “Uber” team. Attached to the email was a machine drawing of what purported to be an Otto circuit board (the “Replicated Board”) that bore a striking resemblance to – and shared several unique characteristics with – Waymo’s highly confidential current-generation LiDAR circuit board, the design of which had been downloaded by Mr. Levandowski before his resignation.

The presiding judge, Alsup, said, "this is the biggest trade secret crime I have ever seen. This was not small. This was massive in scale."

(Pronto connection: Levandowski got pardoned by Trump and is CEO of Pronto autonomous vehicles.)

https://arstechnica.com/tech-policy/2017/02/waymo-googles-se...

ra7 3 hours ago|||

We know Waymo reduced their LiDAR price from $75,000 to ~$7500 back in 2017 when they started designing them in-house: https://arstechnica.com/cars/2017/01/googles-waymo-invests-i...

That was 2 generations of hardware ago (4th gen Chrysler Pacificas). They are about to introduce 6th gen hardware. It's a safe bet that it's much cheaper now, given how mass produced LiDARs cost ~$200.

eptcyka 5 hours ago||||

Less than the lives it saves.

xnx 5 hours ago|||

Cheaper every year.

hijnksforall 4 hours ago||

Exactly.

Tesla told us their strategy was vertical integration and scale to drive down all input costs in manufacturing these vehicles...

...oh, except lidar, that's going to be expensive forever, for some reason?

SecretDreams 5 hours ago|||

> Humans do this, just in the sense of depth perception with both eyes.

Humans do this with vibes and instincts, not just depth perception. When I can't see the lines on the road because there's too much slow, I can still interpret where they would be based on my familiarity with the roads and my implicit knowledge of how roads work, e.g. We do similar things for heavy rain or fog, although, sometimes those situations truly necessitate pulling over or slowing down and turning on your 4s - lidar might genuinely given an advantage there.

pookeh 5 hours ago|||

That’s the purpose of the neural networks

array_key_first 4 hours ago||

Yes and no - vibes and instincts isn't just thought, it's real senses. Humans have a lot of senses; dozens of them. Including balance, pain, sense of passage of time, and body orientation. Not all of these senses are represented in autonomous vehicles, and it's not really clear how the brain mashes together all these senses to make decisions.

mycall 5 hours ago|||

That is still important for safety reasons in case someone uses a LiDAR jamming system to try to force you into an accident.

etrautmann 5 hours ago|||

It’s way easier to “jam” a camera with bright light than a lidar, which uses both narrow band optical filters and pulsed signals with filters to detect that temporal sequence. If I were an adversary, going after cameras is way way easier.

sroussey 4 hours ago||

Oh yeah, point a q-beam at a Tesla at night, lol. Blindness!

Jyaif 5 hours ago|||

If somebody wants to hurt you while you are traveling in a car, there are simpler ways.

shihab 5 hours ago|||

I think there are two steps here: converting video to sensor data input, and using that sensor data to drive. Only the second step will be handled by cars on road, first one is purely for training.

sschueller 3 hours ago|||

Autonomous cars need to be significantly better than humans to be fully accepted especially when an accident does happen. Hence limiting yourself to only cameras is futile.

dooglius 4 hours ago|||

They may be trying to suggest that, that claim does not follow from the quoted statement.

uejfiweun 5 hours ago||

I've always wondered... if Lidar + Cameras is always making the right decision, you should theoretically be able to take the output of the Lidar + Cameras model and use it as training data for a Camera only model.

olex 5 hours ago|||

That's exactly what Tesla is doing with their validation vehicles, the ones with Lidar towers on top. They establish the "ground truth" from Lidar and use that to train and/or test the vision model. Presumably more "test", since they've most often been seen in Robotaxi service expansion areas shortly before fleet deployment.

bob_theslob646 5 hours ago||

Is that exactly true though? Can you give a reference for that?

olex 5 hours ago|||

I don't have a specific source, no. I think it was mentioned in one of their presentation a few years back, that they use various techniques for "ground truth" for vision training, among those was time series (depth change over time should be continuous etc) and iirc also "external" sources for depth data, like LiDAR. And their validation cars equipped with LiDAR towers are definitely being seen everywhere they are rolling out their Robotaxi services.

senordevnyc 2 hours ago||

are definitely being seen everywhere they are rolling out their Robotaxi services

So...nowhere?

__alexs 5 hours ago||||

> you should theoretically be able to take the output of the Lidar + Cameras model and use it as training data for a Camera only model.

Why should you be able to do that exactly? Human vision is frequently tricked by it's lack of depth data.

scarmig 5 hours ago||

"Exactly" is impossible: there are multiple Lidar samples that would map to the same camera sample. But what training would do is build a model that could infer the most likely Lidar representation from a camera representation. There would still be cases where the most likely Lidar for a camera input isn't a useful/good representation of reality, e.g. a scene with very high dynamic range.

dbcurtis 4 hours ago||||

No, I don't think that will be successful. Consider a day where the temperature and humidity is just right to make tail pipe exhaust form dense fog clouds. That will be opaque or nearly so to a camera, transparent to a radar, and I would assume something in between to a lidar. Multi-modal sensor fusion is always going to be more reliable at classifying some kinds of challenging scene segments. It doesn't take long to imagine many other scenarios where fusing the returns of multiple sensors is going to greatly increase classification accuracy.

etrautmann 5 hours ago|||

Sure, but those models would never have online access to information only provided in lidar data…

tfehring 4 hours ago||

No, but if you run a shadow or offline camera-only model in parallel with a camera + LIDAR model, you can (1) measure how much worse the camera-only model is so you can decide when (if ever) it's safe enough to stop installing LIDAR, and (2) look at the specific inputs for which the models diverge and focus on improving the camera-only model in those situations.

caycep 1 hour ago||

All this work is impressive, but I'd rather have better trains

scoofy 1 hour ago||

As someone who lives in the Bay Area we already have trains, and they're literally past the point of bankruptcy because they (1) don't actually charge enough maintain the variable cost of operations, (2) don't actually make people pay at all, and (3) don't actually enforce any quality of life concerns short of breaking up literal fights. All of this creates negative synergies that pushes a huge, mostly silent segment of the potential ridership away from these systems.

So many people advocate for public transit, but are unwilling to deal with the current market tradeoffs and decisions people are making on the ground. As long as that keeps happening, expect modes of transit -- like Waymo -- that deliver the level of service that they promise to keep exceeding expectations.

I've spent my entire adult life advocating for transportation alternatives, and at every turn in America, the vast majority of other transit advocates just expect people to be okay with anti-social behavior going completely unenforced, and expecting "good citizens" to keep paying when the expected value for any rational person is to engage in freeloading. Then they point to "enforcing the fare box" as a tradeoff between money to collect vs cost of enforcement, when the actually tradeoff is the signalling to every anti-social actor in the system that they can do whatever they want without any consequences.

I currently only see a future in bike-share, because it's the only system that actually delivers on what it promises.

doctoboggan 51 minutes ago|||

> they (1) don't actually charge enough maintain the variable cost of operations

Why do you expect them to make money? Roads don't make money and no one thinks to complain about that. One of the purposes of government is to make investment in things that have more nebulous returns. Moving more people to public transit makes better cities, healthier and happier citizens, stronger communities, and lets us save money on road infrastructure.

scoofy 43 minutes ago||

>Why do you expect them to make money?

I don't.

That's why I said "variable cost of operations."

If a system doesn't generate enough revenue to cover the variable costs of operation, then every single new passenger drives the system closer to bankruptcy. The more "successful" the system is -- the more people depend on it -- the more likely it is to fail if anything happens to the underlying funding source, like a regular old local recession. This simple policy decision can create a downward economic spiral when a recession leads to service cuts, which leads to people unable to get to work reliably, which creates more economic pain, which leads to a bigger recession... rinse/repeat. This is why a public transit system should cover variable costs so that a successful system can grow -- and shrink -- sustainably.

When you aren't growing sustainably, you open yourself up to the whims of the business cycle literally destroying your transit system. It's literally happening right now with SF MUNI, where we've had so many funding problems, that they've consolidated bus lines. I use the 38R, and it's become extremely busy. These busses are getting so packed that people don't want to use them, but the point is they can't expand service because each expansion loses them more money, again, because the system doesn't actually cover those variable costs.

The public should be 100% completely covering the fixed capital costs of the system. Ideally, while there is a bit of wiggle room, the ridership should be 100% be covering the variable capital costs. That way the system can expand when it's successful, and contract when it's less popular. Right now in the Bay Area, you have the worst of both worlds, you have an underutilized system with absolutely spiraling costs, simply because there is zero connection between "people actually wanting to use the system" and "where the money comes from."

martinald 5 minutes ago||||

You're definitely right on (2) and (3). I've used many transit systems across the world (including TransMilenio in Bogota and other latam countries "renowned" for crime) and I have never felt as unsafe as I have using transit in the SFBA. Even standing at bus stops draws a lot of attention from people suffering with serious addiction/mental health problems.

1) is a bit simplistic though. I don't know of any European system that would cover even operating costs out of fare/commercial revenue. Potentially the London Underground - but not London buses. UK National Rail had higher success rates

The better way to look at it imo is looking at the economic loss as well of congestion/abandoned commutes. To do a ridiculous hypothetical, London would collapse entirely if it didn't have transit. Perhaps 30-40% of inner london could commute by car (or walk/bike), so the economic benefit of that variable transit cost is in the hundreds of billions a year (compared to a small subsidy).

It's not the same in SFBA so I guess it's far easier to just "write off" transit like that, it is theoretically possible (though you'd probably get some quite extreme additional congestion on the freeways as even that small % moving to cars would have an outsized impact on additional congestion).

caycep 15 minutes ago||||

Maybe not BART but the new Caltrain electrification program seems to be a success and ridership and revenue are up

caycep 1 hour ago|||

Well then invest in those things, then. It would probably cost less than the amount they're spending to make a Waymo World Model.

scoofy 1 hour ago||

Lighting money on fire by funding an extremely expensive system that most people don't want to use is not an "investment." It's just a good way to make everyone much poorer and worse off than if we'd done nothing. The only way to change things is to convince the electorate that we actually do need rules and enforcement and a sustainable transportation system.

This isn't just happening in America. Train systems are in rough shape in the UK and Germany too.

Ebike shares are a much more sustainable system with a much lower cost, and achieve about 90% of the level of service in temperate regions of the country. Even the ski-lift guy in this thread has a much more reasonable approach to public transit, because they actually have extremely low cost for the level of service they provide. Their only real shortcoming is they they don't handle peak demand well, and are not flexible enough to handle their own success.

caycep 30 minutes ago||

People want to use it everywhere in the world

scoofy 25 minutes ago||

People want to have their cake and eat it too.

servo_sausage 35 minutes ago|||

Trains need well behaved people, otherwise they are shit.

I don't want to hear tiktok or full volume soap operas blasting at some deaf mouth breather.

I don't want to be near loud chewing of smelly leftovers.

I don't want to be begged for money, or interact with high or psychotic people.

The current culture doesn't allow enforcement of social behaviour: so public transport will always be a miserable containment vessel for the least functional, and everyone with sense avoids the whole thing.

neysofu 27 minutes ago||

> some deaf mouth breather

I quite agree with the overall point but can we leave this kind of discourse on X, please? It doesn't add much, it just feels caustic for effect and engagement farming.

joenot443 13 minutes ago|||

I think future generations will resent us for bureaucratizing our way out of the California HSR.

chufucious 1 hour ago|||

Me too but given our extensive car brain culture, Waymo is an amazing step to getting less drivers & cars off the road, and to further cement future generations not ever needing to drive or own cars

andoando 1 hour ago|||

Ski lifts man, ski lifts all over the city

bryan_w 23 minutes ago|||

> Ski lifts man, ski lifts all over the city

Don't they have those somewhere in South America?

underdeserver 1 hour ago|||

What a glorious utopia we could have

xnx 1 hour ago||

Isn't a vehicle that goes from anywhere to anywhere on your own schedule, safely, privately, cleanly, and without billions in subsidies better?

anigbrowl 1 hour ago|||

I don't think individual vehicles can ever achieve the same envirnmental economies of scale as trains. Certainly they're far more convenient (especially for short-haul journeys) but I also think they're somewhat alienating, in that they're engineering humans out of the loop completely which contributes to social atomization.

xnx 1 hour ago||

> I don't think individual vehicles can ever achieve the same envirnmental economies of scale as trains.

I think you'd be surprised. Look at the difference in cost per passenger mile.

appreciatorBus 1 hour ago||||

Trains only require subsidies in a world where human & robot cars are subsidized.

As soon as a mode of transport actually has to compete in a market for scarce & valuable land to operate on, trains and other forms of transit (publicly or privately owned) win every time.

kentiko 42 minutes ago||||

Cars don't work in dense places.

g947o 1 hour ago||||

Not necessarily, and your premise is incorrect.

kidk 1 hour ago|||

Billions of subsidies? Im confused you talking about cars or trains.

xnx 1 hour ago||

No major US public transportation system is fully paid for by riders.

caycep 16 minutes ago|||

NYC congestion pricing seems to be working quite well though, and probably helps offset MTA costs.

semiquaver 44 minutes ago||||

Yep. https://www.transitwiki.org/TransitWiki/index.php/Farebox_Re... is a sobering reminder that many cities’ public transportation would cost $20-50 per trip if paid entirely by riders and thus could not exist without subsidy.

JimmyBuckets 1 hour ago|||

That includes cars on public roads.

ra7 5 hours ago||

The novel aspect here seems to be 3D LiDAR output from 2D video using post-training. As far as I'm aware, no other video world models can do this.

IMO, access to DeepMind and Google infra is a hugely understated advantage Waymo has that no other competitor can replicate.

codexb 3 hours ago||

3d from moving 2d images has been a thing for decades.

ra7 3 hours ago||

This is 3D LiDAR output (multimodal) from 2D images.

promiseofbeans 2 hours ago||

LiDAR is the technology used to do spatial capture. The output is just point clouds of surfaces. So they’re generating surface point clouds from video

moffkalast 6 minutes ago||

It's not unheard of, there are a handful [0] of metric monodepth methods that output data that's not unlike a really inaccurate 3D lidar, though theirs certainly looks SOTA.

[0] https://github.com/YvanYin/Metric3D

joshuamerrill 2 hours ago||

It’s impressive to see simulation training for floods, tornadoes, and wildfires. But it’s also kind of baffling that a city full of Waymos all seemed to fail simultaneously in San Francisco when the power went out on Dec 22.

A power outage feels like a baseline scenario—orders of magnitude more common than the disasters in this demo. If the system can’t degrade gracefully when traffic lights go dark, what exactly is all that simulation buying us?

GoatOfAplomb 2 hours ago||

All this simulation buys a single vehicle that drives better. That failure was a fleet-wide event (overloading the remote assistance humans).

That is, both are true: this high-fidelity simulation is valuable and it won't catch all failure modes. Or in other words, it's still on Waymo for failing during the power outage, but it's not uniquely on Waymo's simulation team.

flutas 2 hours ago||

They've also been seen driving directly into flood waters, with one driving through the middle of a flooded parking lot.

https://www.reddit.com/r/SelfDrivingCars/comments/1pem9ep/hm...

nightpool 2 hours ago||

Interesting, but it feels like it's going to cope very poorly with actually safety-critical situations. Having a world model that's trained on successful driving data feels like it's going to "launder" a lot of implicit assumptions that would cause a car to get into a crash in real life (e.g. there's probably no examples in the training data where the car is behind a stopped car, and the driver pulls over to another lane and another car comes from behind and crashes into the driver because it didn't check its blindspot). These types of subtle biases are going to make AI-simulated world models a poor fit for training safety systems where failure cannot be represented in the training data, since they basically give models "free reign" to do anything that couldn't be represented in world model training.

420official 1 hour ago||

You're forgetting that they are also training with real data from the 100+ million miles they've driven on real roads with riders, and using that data to train the world model AI.

> there's probably no examples in the training data where the car is behind a stopped car, and the driver pulls over to another lane and another car comes from behind and crashes into the driver because it didn't check its blindspot

This specific scenario is in the examples: https://videos.ctfassets.net/7ijaobx36mtm/3wK6IWWc8UmhFNUSyy...

It doesn't show the failure mode, it demonstrates the successful crash avoidance.

MillionOClock 1 hour ago|||

While there most likely is going to be some bias in the training of those kinds of models, we can also hope that transfer learning from other non-driving videos will at least help generate something close enough to the very real but unusual situations you are mentioning. We could imagine an LLM serving as some kind of fuzzer to create a large variety of prompts for the world model, which as we can see in the article seems pretty capable at generating fictive scenarios when asked to.

As always tho the devil lies in the details: is an LLM based generation pipeline good enough? What even is the definition of "good enough"? Even with good prompts will the world model output something sufficiently close to reality so that it can be used as a good virtual driving environment for further training / testing of autonomous cars? Or do the kind of limitations you mentioned still mean subtle but dangerous imprecisions will slip through and cause too poor data distribution to be a truly viable approach?

My personal feeling is that this we will land somewhere in between: I think approaches like this one will be very useful, but I also don't think the current state of AI models mean we can have something 100% reliable with this.

The question is: is 100% reliability a realistic goal? Human drivers are definitely not 100% reliable. If we come up with a solution 10x more reliable than the best human drivers, that maybe has some also some hard proof that it cannot have certain classes of catastrophic failure modes (probably with verified code based approaches that for instance guarantees that even if the NN output is invalid the car doesn't try to make moves out of a verifiably safe envelope) then I feel like the public and regulators would be much more inclined to authorize full autonomy.

0xTJ 19 minutes ago||

Interesting, but I am very sceptical. I'd be interested in seeing actual verified results of how it handles a road with heavy snow, where the only lane references are the wheel tracks of other vehicles, and you can't tell where the road ends and the snow-filled ditch begins.

hazrmard 4 hours ago||

cue the bell curve meme for learning autonomy:

                 ____.----.____
          ______/              \______
    _____/                            \_____
    ________________________________________

    (simulations)  (real world data)  (simulations)

Seems like it, no?

We started with physics-based simulators for training policies. Then put them in the real world using modular perception/prediction/planning systems. Once enough data was collected, we went back to making simulators. This time, they're physics "informed" deep learning models.

crazygringo 2 hours ago|

That's a very interesting way of looking at it. Yes, you start with simulating something simpler than the real world. Then you use the real world. Then you need to go back to simulations for real-world things that are too rare in the real world to train with.

Seems like there ought to be a name for this, like so-and-so's law.

buddhistdude 2 hours ago||

hazrmard's law

mellosouls 4 hours ago|

Deepmind's Project Genie under the hood (pun intended). Deepmind & Waymo both Alphabet(Google) subsidiaries obv.

https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...

Discussed here,eg.

Genie 3: A new frontier for world models (1510 points, 497 comments)

https://news.ycombinator.com/item?id=44798166

Project Genie: Experimenting with infinite, interactive worlds (673 points, 371 comments)

https://news.ycombinator.com/item?id=46812933

paxys 4 hours ago|

Regardless of the corporate structure DeepMind is a lot more than just another Alphabet subsidiary at this point considering Demis Hassabis is leading all of Google AI.

More comments...