Top
Best
New

Posted by spectraldrift 11 hours ago

Gemini 3.5 Flash(blog.google)
https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...
669 points | 483 commentspage 3
himata4113 11 hours ago|
Engineers at google have publically stated that the models are too big and are far from their potencial. Glad they're being proven right with every release.

They continue to focus on smaller models while openai and anthropic are increasing compute requirements for their SOTA models.

stri8ed 11 hours ago||
Given the cost increase associated with this model, and previous model releases, I think the size is trending upwards, not down.
himata4113 11 hours ago||
The speed says otherwise. I think they're increasing costs since they want to start seeing ROI.
JanSt 10 hours ago||
Those are (mostly) new, faster TPU
himata4113 10 hours ago||
latest TPU's appear to reach 800tok/s rather than the advertised 300tok/s.
mgambati 7 hours ago||
They demoed today 8i running ate 1300 to 1600ish tokens per second. I imagine that is caused by having a single rack serving the model just for the demo.
himata4113 6 hours ago||
There's a limit to how much you can "scale" this process, it's linear, but if we did napkin math based on vllm parallel batched streams only lose around ~50% performance compared to single-stream output so doesn't explain the ridicioulusly fast numbers here.

I wish google just came out and told us how large their flash model is, because if it's as big or smaller than gpt-5.4-nano that's the real headline here.

Jabbles 10 hours ago|||
> Engineers at google have publically stated that the models are too big and are far from their potencial

Can you link to a source?

himata4113 5 hours ago||
I wish I could, it was one of those youtube podcast type interviews with one of the engineers, there was a lot more shared, but that line stuck with me the most.
Dinux 10 hours ago|||
Source please cause i dont believe that for once second
maipen 11 hours ago|||
Don’t let that fool yourself. Google will have SOTA models as big as or even bigger than their competitors.

They are just refining their current models while they finish training the next generation.

They will all come out at about the same time. Anthropic, OpenAi, Google, xAI

ACCount37 11 hours ago||
Anthropic has been sitting on Mythos for a while now. I guess they don't feel pressured to fuck it ship it until anyone else gets a 10T to work.
throwa356262 10 hours ago|||
According to people who have access to Mythos, it is slightly worse than GPT-5.5-xhigh. At least for security tasks.

Hold on, I think this claim needs some hard data. Here you go gentlemen:

https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...

aesthesia 10 hours ago|||
See the later post testing a newer Mythos checkpoint, though: https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber...
throwa356262 8 hours ago||
Fair enough
ACCount37 10 hours ago|||
That claim keeps contradicted hard by other parties, who say Mythos beats 5.5 resoundingly on both autonomous search and discovery and creation of complex exploit chains.

There might be a harness difference, but also, this CTF-type benchmark might not capture the capability difference fully.

nimchimpsky 6 hours ago||
[dead]
abirch 10 hours ago||||
Anthropic can sell Mythos to Fortune 500 companies and bypass the average user. I'm not sure how much is hype but I see things like this https://blog.cloudflare.com/cyber-frontier-models/
Sevii 11 hours ago||||
It's doubtful they have the compute to make mythos publicly available even after the SpaceX datacenter deal. And why sell it publicly if people are still willing to pay for Opus 4.7?
outside1234 10 hours ago|||
I suspect that Mythos doesn't have a business model that works
howdareme 11 hours ago|||
Google’s pro models are almost certainly bigger than Openai’s lol
fikama 10 hours ago||
Why would that be? I am curious why do you think that.
mnicky 9 hours ago|||
E.g. because they are behind on research and so must compensate with size to achieve similar level of intelligence. At least this is what I heard.

For intelligence/size only OpenAI and Anthropic are the frontier. Google has more compute so it can compensate for that with size of the models...

snovv_crash 8 hours ago||
I'd argue Qwen is pushing the Pareto frontier considerably further when you take size into account.
ActorNightly 9 hours ago|||
Because TPUs are more efficient, and its cheaper for them to field them in higher quantity since they own the chip.
ActorNightly 9 hours ago||
I mean, yes and no.

Nobody really knows the answer to which one is more optimal

* Large model trained on a large amount of data across multiple domains, that doesn't need any extra content to answer questions.

* Smaller model that is smart enough to go fetch extra relevant content, and then operate on essentially "reformatting" the context into an answer.

stared 7 hours ago||
China: we don’t need to use US models, we can distill them ourself

Google: we don’t need Chinese to distill our models, we can do it ourself

paol_taja 5 hours ago||
That pelican looks like it just sold a SaaS company and bought a bike because its therapist said it needed balance.
s3p 4 hours ago|
The pelican is ready to discuss increased synergies of bringing AI to all teams at the firm!
golfer 11 hours ago||
Here's the benchmark scoreboard they published:

https://storage.googleapis.com/gweb-uniblog-publish-prod/ori...

Alifatisk 8 hours ago||
The demo of the model in Antigravity automatically rename and categorize unstructured assets using vision was quite cool, it demodulates that the IDE sidepanel can be used for more than just coding. I wonder if the harness in Antigravity is based on Gemini cli or if they are completely different. Could Gemini cli do the same task? Or is the vision feature a Antigravity thing?
mrbungie 4 hours ago|
There is now an Antigravity CLI which will replace Gemini CLI. Gemini CLI is going to be EOLd by June 18th afaik. Antigravity CLI and GUI share the same agent harness, so it might do the same task.

Source: https://developers.googleblog.com/an-important-update-transi...

razodactyl 4 hours ago||
Aw. The listen to article widget doesn't work properly on mobile Safari and when using the options button, the popup appears below the "In this article" dropdown occluding it.

At least it read the authors of the article to me.

I wish we would push more towards testing code. Agentic AI excel when it's engaged.

sbinnee 7 hours ago||
While I am excited, the price compared to gemini 3 flash preview which I used for the longest time is x3 more. Upon arrival of deepseek v4 flash, I am a happy user of deepseek. We will see how long that reign would last after I try this new gemini.
ElenaDaibunny 1 hour ago||
but latency in real GUI workflows with 50+ steps is still the elephant in the room for cloud-based agents
golfer 10 hours ago||
Arena.ai:

> Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers.

https://x.com/arena/status/2056793180998361233

h14h 9 hours ago|
Given how widely varying the amount of tokens each model uses for a given task, "price-per-token" is essentially meaningless when doing this sort of comparison.

Artificial Analysis's "Cost to run" model (aka num_tokens_used * price_per_token) is much better, but even that is likely problematic since it's not clear whether running a bunch of benchmarks maps cleanly to real-world token use.

merb 10 hours ago|
Stil no new processor version for document ai https://docs.cloud.google.com/document-ai/docs/release-notes that is so weird. (Customer extractor)

It’s not possible to uptrain on preview releases and it did not get that much love for a while.

More comments...