Towards a science of scaling agent systems: When and why agent systems work

Posted by gmays 5 days ago

Towards a science of scaling agent systems: When and why agent systems work(research.google)

106 points | 36 commentspage 2

maxdo 5 days ago||

almost feels like paper for the sake of paper to me.

pevansgreenwood 5 days ago||

[dead]

detroitwebsites 5 days ago||

[flagged]

_se 5 days ago||

This is so obviously LLM generated garbage. Anyone upvoting this comment: lol.

lmf4lol 5 days ago||

"Master Open Claw in Hours, Not Months"

How old is openClaw again?

But your webpage is delicious. 11 blog posts only today. You all wrote them yourself?

clawsyndicate 5 days ago||

[dead]

verdverm 5 days ago|

gonna read this with a grain of salt because I have been rather unimpressed with Google's Ai products, save direct API calls to gemini

The rest is trash they are forcing down our throats

4b11b4 5 days ago|

Yeah alpha go and zero were lame. The earth foundation model - that's just ridiculous.

That's sarcasm

---

Your "direct Gemini calls" is maybe the least impressive

edit: This paper is mostly a sort of "quantitative survey". Nothing to get too excited about requiring a grain of salt

verdverm 5 days ago||

The underlying models are impressive, be it Gemini (via direct API calls, vs the app or search), I would include alpha-go/fold/etc in that classification

The products they build, where the agentic stuff is, is what I find unimpressive. The quality is low, the UX is bad, they are forced into every product. Two notable examples, search in GCloud, gemini-cli, antigravity (not theirs technically, $2B whitelabel deal with windsurf iirc)

So yes, I see it as perfectly acceptable to be more skeptical of Google's take on agentic systems when I find their real world applications lackluster

tucnak 4 days ago|||

Antigravity is not a windsurf reskin, at least not today; it introduces many concepts and optimisations that you wouldn't find anywhere else, and in my workflows Gemini 3 Flash in Antigravity also happens to outmatch Claude Code with Opus 4.5 on some really gruesomely complicated tasks (i.e. involving compiler/decompiler work.)

They are really cooking with Flash + Antigravity.

verdverm 3 days ago||

The Gemini family seems to be better at high-level, generalizing

The Claude family seems to be better at localized coding tasks

I'm a big fan of Claude Code based prompts with Gemini 3 Flash in my coding agent. I'm unwilling to use any new Google products at this point in time. Used to be a stan, they have pushed me away and I'll never be a stan again

4b11b4 5 days ago|||

I agree with you in general re "agentic systems". Though they might deliberately not be trying to compete in the "agent harness" space yet.

The antigravity experiment yes was via windsurf - probably nobody expected that to take off but maybe was work that made have surfaced some lessons worth learning from.

verdverm 5 days ago||

My hunch is that Google is past it's prime, all the good PMs are gone, and now it looks like a chicken hydra with all the heads off and trying to run in multiple directs.

There is no clear vision, coherence, or confidence that the products will be around in a another year

nawgz 5 days ago||

Kind of a weird take given they are one of the strongest AI providers who are the most vertically integrated. Sure, maybe the company isn’t as healthy as it once was, but none of them are - late stage capitalism is rotting most foundations

verdverm 5 days ago||

I saying this as a big, but dimming, Google-stan

Their poor product decisions have driven me away, that doesn't mean I'm still very impressed with everything under that. I'm building my custom agent on their open source Agent Development Kit and the Gemini family.