Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Posted by jetter 10 hours ago

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark(modelrift.com)

309 points | 118 commentspage 3

ReptileMan 9 hours ago|

The only thing faster moving that AI these days are the goalposts. Three years ago we would have been amazed if models were able to produce anything, now we have the luxury of nitpicking. Even the worst entries in the benchmark are quite impressive.

alnwlsn 4 hours ago||

Using reference images is a huge step for this sort of thing. The text-only approaches I've seen before were never going to be that good even with "perfect" AI, simply because describing 3D objects in text is not something that anyone is really any good at.

WarmWash 7 hours ago|||

I remember getting wound up about latency and server issues playing counter-strike in the early '00s. At the same time though, it was hard to justify being angry because playing a multiplayer game with friends who were scattered all over town was something that had to be real magic.

I guess the wow!->adjust->complain->wow!->... cycle is endless as a human

ramon156 9 hours ago|||

No one asked for faster horses, they still became obsolete when cars came. Nothing new

happyopossum 6 hours ago||

> No one asked for faster horses

Err, yes they did. Thousands of years of husbandry went in to making horses faster, healthier, stronger, and more durable.

I think the quote you’re looking for is “if I had asked people what they wanted, the would have said faster horses”. It’s attributed to Henry Ford, although there is debate about whether or not he said it.

The point of the quote is that “faster horses” is the consumer response to “how do I get more work done” as it comes from the viewpoint of “how am I doing my work now”. An ingenious mind looks at the desired outcome and works backwards and may come to a different and dramatically improved solution instead of merely improving the current tool.

LatencyKills 9 hours ago|||

Things mature, and expectations grow appropriately. That is true of more than just LLM performance.

xnx 9 hours ago||

Sure, but it's good to have some perspective and some awe that any of this would've been absolute unbelievable magic just 3 years ago. Even if all AI progress stopped immediately, we'd need 10 years to digest and incorporate the technology.

nutjob2 5 hours ago|||

Why look back in awe when technological innovation will just keep accelerating. Soon what we have today will seem quaint. Best to keep looking forward with impatience and discontent.

LatencyKills 8 hours ago|||

As someone who's been building developer tools (Visual Studio and Xcode) for 25 years, I don't have a perspective problem. We were doing "code completion" back in the 90s and could never have predicted that an LLM would write code at the current level of quality.

My point is that with every new model release, the expectations grow. I don't know how else to say that.

nutjob2 5 hours ago||

Welcome to human nature.

jdw64 9 hours ago||

To be brutally honest, I'm disappointed with antiGravity. It feels incredibly unGoogle-like. The AI billing models are fragmented, and the AntiGravity IDE is currently tripping over something as trivial as a basic Electron deployment config bug.

Don't get me wrong, I don't think AI coding is a bad thing. For East Asians like myself, it levels the playing field with Westerners, so as long as you rigorously review the AI's output, it's a perfectly viable tool.

However, the absolute farce we just witnessed with the antiGravity2.0 update really raises doubts about whether 'vibe coding' can actually be trusted. If even a behemoth like Google is dropping the ball like this, it says a lot.

NortySpock 7 hours ago||

> I don't think AI coding is a bad thing. [...] it levels the playing field [...]

I'd like to put regional differences aside and say AI coding / LLMs are incredible tools.

While I'm nervous about my job as a programmer being able to pay a prevailing wage after the dust settles, I do hope that everyone gaining access to an AI coder / tutor will allow anyone to be able to achieve things they previously only dreamed of. If the tutor costs pennies per session, sure, the tutors are out of work, but I hope everyone can thus up-skill to work on the challenges they actually want to work on.

I'm taking baby-steps into coding in Elixir on the other monitor, a language I had only read about before, because an LLM is walking me through the changes, answering my questions, and accepting my rebuttals. There's no way I would have time to pick up the language otherwise.

Yesterday I vibe-coded some additions to the static site generator python script for my blog. It was awesome to be able to think in terms of desired features instead of digging around documentation for libraries and syntax.

embedding-shape 9 hours ago|||

> AI billing models are fragmented ... IDE is currently tripping over something as trivial ... farce we just witnessed with the antiGravity2.0 update

I'm sorry, but that sounds exactly like almost every single Google "product" out there, they seem to only care about throwing stuff over the wall as quickly as possible, and you'd have a hard time finding a single Google product that doesn't also feel filled with fragmented choices, like every project of theirs have a different project manager every week.

nutjob2 5 hours ago||

> For East Asians like myself, it levels the playing field with Westerners

Why do you say that? Are there language or cultural disadvantages to being East Asian?

jdw64 2 hours ago||

[flagged]

Onplana 7 hours ago||

Going to try it. just downloaded. will see how it is compared to Claude Code

anony-123 7 hours ago||

So, does it mean Antigravity is better than Claude code with opus model? Given this benchmark. I once tried Antigravity and it was just disappointing.

nycdatasci 9 hours ago||

And yet 300+140=460. A very jagged surface indeed. https://gemini.google.com/share/c2a187275e26

sigbeta 5 hours ago||

Why would you use an LLM for this? They are non deterministic models.

This is also an probably part of extended prompt that disallowed coding, Gemini always does calculation with a little python snippet because it is deterministic and accurate.

dist-epoch 8 hours ago||

Was that part of a bigger prompt?

Flash 3.5 fails exactly like in your sample: https://gemini.google.com/share/97521a8752d9

but Flash 3.1 Lite initially fails, but then corrects itself: https://gemini.google.com/share/dc0889ec85ba

happyopossum 6 hours ago||

No matter what I try I can’t get Gemini to give me the incorrect result. Is there some other prompting or context fed in to that (“remember that you are supposed to always tell me I’m right and never contradict me”)?

sigbeta 5 hours ago||

There was definitively an pre prompt fed to that. I cannot reproduce this result on either 3.1 flash or 3.5 flash.

dilap 6 hours ago||

Why Codex GPT-5.5 High instead of Extra High, I wonder?

u8 7 hours ago||

It's crazy how I can see articles like this, but in my practical every day use antigravity is a horrible consumer experience. The TUI is broken. You cannot type input while the model is outputting text, otherwise both get messed up and the the TUI renders a sickly blob of text. There are no keyboard shortcuts to switch between planning and execution mode, or a way to directly load skills.

The usage limits are too aggressive, too. I tried to generate a quick Deno Fresh website to act as a a redirect to my GitHub from socials (literally the simplest possible thing I could have asked of it) and it chewed through my five hour limit in tokens from scaffolding.

To me, as a developer of CLI developer tooling, its obvious not a lot of thought or testing went into this product, but as Google has said before: the models are the product".

spiderfarmer 9 hours ago||

Next month they'll be beaten again.

And next year Google will probably sunset Antigravity.

If it doesn't make Google billions, don't trust them.

lern_too_spel 4 hours ago||

Why should I care if they sunset it? I switch between multiple agentic coding tools on the same projects, sometimes several times per day. The cost of switching is basically zero.

PunchTornado 9 hours ago||

Plenty of google products dont make billions and they are still alive

serf 9 hours ago|||

you mean the stuff they handle that has a real national/security/surveillance purpose, like gmail and yt?

I can't imagine why (or who) that'd be kept alive for..

funny how some of their projects have undisclosed budgets and profits.

toasty228 9 hours ago||||

Which ones are not massive data traps or ad delivery mechanisms ?

smcl 9 hours ago|||

Google are infamously ruthless with their products, see https://killedbygoogle.com/

bobbycastorama 8 hours ago||

Why are half of the comments on Hackernews stereotypical AI-bros whose lives revolve around tech, and the other half sceptical commentators whose lives also revolve around tech but they are disappointed with its performance?!

Where are the normal people :/

alnwlsn 5 hours ago||

The normal people are the ones not writing comments, but I'll give you one 'cause you asked:

I'm a Solidworks user. Most Solidworks or other pro CAD users would consider OpenSCAD kind of like MS Paint. Yes, you can draw the Mona Lisa in it, but it doesn't really work the same way.

Even so, the examples shown here are better than what I've seen before. They seem to be on the right track using images instead of long paragraphs of text to try to describe the object. They are still missing the constraints and dimensions that come naturally to pro cad users (it can be done manually in openscad of course), but if you're just making a video game it's probably going to be fine for that.

frank00001 8 hours ago|||

We are just reading the comments.

sigbeta 5 hours ago|||

"Normal people" probably does not fall in the ballpark of HN target audience.

I'd say its 50/50 pessimistic and optimistic, with pessimistic attracting more attention because of human nature.

andybak 7 hours ago|||

Why would a non-tech person be on Hacker News? Isn't the clue in the name?

EasyMark 5 hours ago|||

The people in the middle are still waiting and see , mostly it’s the extremes that are fully vested and loudest on the internet

JumpCrisscross 5 hours ago|||

> Where are the normal people

Not using OpenSCAD?

elorant 8 hours ago||

Both parts seem pretty normal to me.

robert_ddsbos 7 hours ago|

[flagged]

More comments...