Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Posted by jetter 8 hours ago

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark(modelrift.com)

295 points | 114 comments

jhot 7 hours ago|

Last weekend I bought my wife a bike off marketplace. It was in good condition but was missing one of the internal cable routing grommets. I gave Claude pictures of the pill-shaped hole by itself and with my digital calipers in the long and short directions.

Gave it a short prompt and it gave me an openscad model with everything parametrized. I printed with no changes in tpu and it was nearly perfect on the first try. Claude put in a 0.3mm subtraction in the x/y dimensions and I lowered it to 0.1 and it's perfect.

Much easier shape than ancient Roman architecture but still very cool how easy it was.

simplyluke 7 hours ago||

Yeah, CAD has been my personal example of "oh the barrier to entry for this skill was high enough that I didn't do it and now I can be passably bad at it enough to get some simple things done"

I've had similar experiences with making simple functional parts off a 3d printer with OpenSCAD + LLMs. I'm very aware that the models are worse at it than say, generating react code, and I'm also the antithesis of a skilled pilot. It's still cool and has resulted in me starting to learn a new skill at a hobby level.

dempedempe 4 hours ago|||

It's like this with a lot of things now. For example, Nix's learning curve used to be a huge barrier to entry. Now with LLMs, I'm using nix-darwin and home-manager for dotfiles, package management, and have individual flakes in all of my projects for cryptographically reproducible builds!

rlt 3 hours ago|||

Nit: there’s nothing “cryptographic” about reproducible builds.

“Reproducible build” already usually implies bit-by-bit reproducibility.

dempedempe 1 hour ago|||

I meant with Nix you're comparing hashes. With Docker, you're using pinned versions

bt1a 2 hours ago|||

i thought it mainly implied architectural/hardware compatibility and deterministic output

pimeys 2 hours ago|||

Nix is also great at work. You keep the server nix code in the same repo and OpenCode can just change and test server config.

0x696C6961 6 hours ago|||

Learning to make simple parts in onshape is pretty darn easy (and fun).

jeffbee 4 hours ago||

Yeah. I teach this after school to 7th grade kids. Anyone can pick this up in a few hours.

chalupa-supreme 4 hours ago||

They taught us to make Legobricks with CAD when I was in 6th. Wish I retained more of that and that it would be more widely taught.

jeffbee 1 hour ago||

I am reasonably confident that access to solid modeling and additive fabrication is now more widespread than ever.

skinner927 3 hours ago|||

Claude does well if you can provide all dimensions. It fails at guessing though. The real magic is when you can provide one dimension or photograph with a ruler in it and the AI will figure the rest out. Right now, Claude anyways, is pretty bad at guessing.

jetter 7 hours ago|||

these small functional prints are exactly where OpenSCAD and LLM generation shines

jonah 4 hours ago|||

I was recently trying to get models to generate a 3D fortune cookie. Claude in three.js and Gemini in openSCAD. Neither really got the concept or could get very close at all. It's a surprisingly complex shape I guess.

8note 1 hour ago||

with the shape you probably want something thats good at bends/fabric

cause youd start with the flat shape, the set some contraints that certain edges are colinear

amelius 6 hours ago||

Does it optimize for no support?

05 6 hours ago||

You optimize for no support when selecting print orientation (but for anything semi-cylindrical like described that would be the only sane orientation and the one slicer would choose when you smash the 'Auto Orientation' button).

jlhawn 2 hours ago||

> Antigravity was the only autonomous agent that implemented the Pantheon’s signature interior ceiling pattern: repeated square coffers visible through the oculus.

That is seriously really impressive. I looked at the 3D model and didn't even thing to LOOK INSIDE the building before reading this.

Here's [1] the 3D model with `show_cutaway` enabled.

[1] https://modelrift.com/models/pantheon-benchmark-antigravity-...

hereme888 2 hours ago|

Was just going to say.... I looked inside by accident, and it gives a better impression of intelligence and effort than the outside.

mellosouls 8 hours ago||

Antigravity may well Top the whatever benchmark but:

My Antigravity (forced) replacement for Gemini CLI requires me to log on via browser every time I use it, and my Antigravity IDE won't update at all, so:

If it's ok I'd prefer they just work on reaching a baseline acceptable rollout before worrying about being Top in anything.

Ps actual title:

OpenSCAD LLM Benchmark: Building the Pantheon

jetter 8 hours ago||

I agree, my main concern regarding Google AI products is this endless pain around the UX of login / billing / upgrades / product sunsets... but their LLM models are good and Antigravity 2.0 is not that bad either (unless you lost all you Antigravity 1.0 setup and projects - like many people did)

mchusma 2 hours ago|||

I just left the google I/O feeling less confident about google's execution here. - Gemini 3.5 flash is strange. Old cutoff, basically better than 3.1 pro at soem things worse at others, sometimes cheaper, sometimes more expensive than 3.1 pro. - Antigravity had seemed abandoned, and people speculated them cutting it off, and they kind of did migrating everyone to a new antigravity - Google "shipped the org chart" and they have so many AI products and none seem best of breed (e.g. the Gemini integration in google docs is worse than claude)

I was actually hoping for "Opus level intelligence at Haiku costs" model or "Sonnet level performance in Gemini 3.0 pricing", either of these would have been a workhorse, plus a competitor to Claude/Codex (1 app to do things). I got neither.

pelagicAustral 8 hours ago|||

I just use Claude Code and intellij, so I don't understand why so many people complain about Antigravity ditching VS Code, what's the surface not covered by using Antigravity CLI + VS Code (or any other IDE)?

jeromegv 6 hours ago|||

Gemini cli was open source. Antigravity cli is not. Not at feature parity, missing many features and now we are forced to migrate away from Gemini cli before anti gravity cli is ready.

surajrmal 6 hours ago||

The difference in its ability is immense. Even with less features it makes a lot of sense to switch. It really shows how much the harness matters almost equally to the model.

lern_too_spel 2 hours ago||

At least one of the missing features is a basic piece of functionality (showing token quota used). Without it, you're pretty much guaranteed to get locked out for a week with no warning.

freedomben 8 hours ago|||

I'm not GP, but I am somewhat excited about antigravity CLI. I adopted Gemini CLI early and really liked it, though over time it got dumber and dumber until a point when I realized it was foolish to use it instead of claude/codex. I'm hopefuly that antigravity CLI won't go through that path, but also can't fight a skepticism.

jeromegv 6 hours ago||

I don’t think it’s the cli that was dumber, just the model it was using. They drastically reduced limits on their best model so that’s likely how you got stuck downgrading model and getting worse results.

WarmWash 5 hours ago||

I'm sensing in reality that behind the scenes there is a difficult trade-off between quantization and usage limits. You can have a "smart" model but poor limits, or good limits and a "dumb" model.

This seems very similar to mobile data limits (remember those years?), where there wasn't enough tower bandwidth to serve everyone unlimited data, so telecos were in constant tension between data caps and bandwidth throttling.

It wasn't until 5G came along with 100x network capacity that they could finally give everyone "unlimited" data.

arthurtully 57 minutes ago|||

I've got an AI pro plan and haven't been able to log in for months. Endless checking in with my google support guy. At least Dinesh wishes me good health every week, so that's nice.

freedomben 8 hours ago|||

Having my workflow disrupted is the main reason I never adopted Antigravity, despite liking it. I'm glad to see G is invested, but the older I get the more protective I am of my workflow.

hootz 7 hours ago||

And the only realistic way to protect our workflow is by avoiding vendor lock-in like the plague.

freedomben 32 minutes ago||

Exactly. I admit it's a bit extreme, but this is a big reason why I insist that neovim is my IDE, and I won't adopt anything else. If I can't make it work in neovim, I will move to something else (unless I have no choice, but that happens very rarely at this point).

VectorLock 7 hours ago|||

The forced upgrade from Gemini CLI which I liked as much, and as some ways better than Claude Code was bad. But them just sending out that email on Wednesday that basically said "Thanks for subscribing to Google One AI Pro, as of right now we're adding limits to your account. Tough shit you get nothing." left a REALLY bad taste in my mouth. I had previously praised the "AI Pro" subscription as a good value.

leoedin 7 hours ago||

I quit AI Pro earlier this year for the same reason. I went to use it one day (I don't think I'd even used it much in the preceding week) and found that my limits had been reduced overnight and my usage was already too high. I had something like a 7 day wait until it reset.

I get you have to change limits, but reducing limits in a way which both applies retroactively and has a really long reset period is just infuriating. If they'd applied the new limits more gently or at the next billing period I'd probably have continued paying.

I don't mind paying a fair price for a service that provides value, but I really hate having a service I think I'm paying for rug-pulled with no clear justification.

the_real_cher 8 hours ago|||

Wild that it doesn't cache the creds.

elaus 7 hours ago|||

Just to clarify: I believe it should cache them (it works for me).

So far I like it much more than Gemini CLI (my previous daily driver for personal projects). Seems more mature and "feels more intelligent" (very subjective ofc)

timdorr 4 hours ago||||

It does. It uses go-keyring under the hood, which has its own issues with certain systems.

If you're on WSL, getting dbus to work is a PITA. There may be other OS-level issues that folks are running into.

dezgeg 4 hours ago||||

It requires a keyring service being installed (accessed over dbus) and if there isn't one it just silently doesn't store them anywhere. Pretty bad UX.

littlecranky67 7 hours ago|||

My (unfounded) guess is this is to prevent usage by other tools/openclaw. The browser login will have a fingerprinting to make sure you are a human.

stuaxo 7 hours ago||

"Pantheon" bloody hell, why is it people writing these articles are so up themselves, it's so overbearing.

tpmoney 7 hours ago||

The article is literally about asking these models to generate 3d models of the Pantheon.

tjoff 46 minutes ago||

I've had such a bad time trying to do this myself. You might get a half-way decent draft on the first try and then you start to "debug" this and after a very frustrating session you realize that the model can't properly "see" the results. That is, you just can't iterate on it, at all.

I'm guessing that most harnesses/tools will resize an image before processing and in doing so will loose enough detail to make it much harder to reason about - especially wireframe images.

I'm sure I'm holding it wrong, but this test didn't really test this. It was just a one off. That breaks down pretty quickly and especially if you don't have reference pictures of what you are trying to create.

ponyous 5 hours ago||

I've run a tons of benchmarks for OpenSCAD for all kinds of models and setups, and what I realised is:

- Models are very jagged (might excel in one type of 3d model, but not another)

- Gemini models are the least jagged in my experience and have the best image understanding

- Gemini models are also the most creative (which may be undesirable if you want precise CAD part)

- Overall this benchmark doesn't prove much because one 3d model (and one attempt) is just not enough. I am usually testing on at least a dozen models each generated 3 times, but should really do much more, but it's too pricey for a solo dev.

Still, thanks for publishing this. Will be definitely run flash 3.5 soon to see how it performs.

1970-01-01 5 hours ago||

Creating a single real-world object and declaring it a benchmark? No, it doesn't work that way for a robust tool. You need to do something like Iron Chef, with a Greek architecture theme and and a panel or judge that declares the winner. This is just seeing which tool subjectively makes the best looking Pantheon.

Eridrus 5 hours ago|

Yeah, this is less of a benchmark and more "I like this one guys!".

Just totally subjective grading criteria of a single poorly defined example with no end use case in mind to guide how to even do evaluation.

dhfbshfbu4u3 8 hours ago||

Still a long way from shorting Autodesk.

As a side note Autodesk released an agentic assistant back in December for Fusion. Six months later it is still quite bad.

hobofan 7 hours ago||

It is almost comically bad. I've had a few simple parts to design for 3d printing in the last weeks and tried it with them (each are about 4 operations on the timeline), and it never created close to what I was trying to do even if spelled out step by step according to Fusion naming.

At this point I'm not even sure if it can properly create a simple primitive solid.

blorenz 4 hours ago|||

Have you yet tried the Fusion MCP that was launched last month? https://aps.autodesk.com/blog/bringing-fusion-claude-creativ...

shideneyu 4 hours ago||

Still a long way to go, but I'm sure it will get there eventually.

sjia 55 minutes ago||

Isn't CadQuery more professionally than OpenSCAD close to traditional CAD / mechanical engineering workflows. Not sure which model (ChatGPT, Gemini, and Claude Code) is better for CadQuery code generation?

seniorsassycat 2 hours ago||

I tried Claude code designing a snap fit, vase mode printed box. Ultimately didn't work out, it couldn't get the tolerances right and kept designing features that wouldn't print in vase mode.

Scad needs unit tests. It would be powerful to asset that a profile doesn't have slope greater than 45°, that intersection of two objects is null, or specific volume.

It also needs cut away views. I got okay results using boxes to remove everything except a sliver, to view a slice and internal details. But without hash marks, texture, or outlines it can be hard to tell the forms.

gbgarbeb 47 minutes ago|

"Vase mode snap-fit box" sounds to me like "flexible concrete".

seemaze 2 hours ago|

I'm unconvinced, this is one of the most iconic historical buildings with tomes written about it and plenty of existing photographs and public models to train on.

I would be more interested in benchmarking the modeling of an anonymous structure based on provided references alone. It kind of feels like the shallow magic of watching an LLM one-shot a to-do app..

More comments...