Top
Best
New

Posted by berlianta 22 hours ago

AlphaEvolve: Gemini-powered coding agent scaling impact across fields(deepmind.google)
301 points | 126 commentspage 3
AndrewKemendo 18 hours ago|
From the comments it seems that this community (mostly career software people) is starting to move into a new phase of grief about the median software engineer losing their hoped for permanent place in society.

-2021-2024 was Denial

-2024-2025 was Anger and Bargaining

-2026 seems to be some combo of anger, bargaining and acceptance depending mostly on your class/age

artninja1988 16 hours ago|
I think we are still in the denial phase.
guybedo 17 hours ago||
and yet Gemini still can't code
maxothex 21 hours ago||
What I'm most curious about is how this translates to messy, real-world codebases without well-defined metrics. Most production software isn't chip design or kernel optimization - it's business logic with unclear success criteria. The infrastructure story is impressive, but I'd love to see how they handle domains where the evaluation function itself is ambiguous.
marcus_ai 21 hours ago||
[flagged]
svieira 18 hours ago||
> In advertising and marketing, WPP used AlphaEvolve to refine AI model components, navigating complex, high-dimensional campaign data and achieving 10% accuracy gains over their competitive manual model optimizations.

Ah good, we're getting closer and closer to Venus, Inc. every day. /s

kadam2576 21 hours ago|
[flagged]
stalfie 21 hours ago|
Well, if the evaluation infrastructure is something humans could have had access to before, and that the agents key "skill" is just that it's a more patient and scalable worker, I would still argue that this "comes from the agent".

Humans get bored, inpatient, or run out of time, and so often give up in what they perceive to be a decent "local minima". Early verification harnesses using gpt-4 for optimizing robot reward functions succeeded quite well on the fact that the LLM just kept going (link below). As long as it is too boring for a human to use the same evaluation infrastructure, this is still an agent skill.

https://arxiv.org/abs/2310.12931