Top
Best
New

Posted by MallocVoidstar 22 hours ago

Gemini 3.1 Pro(blog.google)
Preview: https://console.cloud.google.com/vertex-ai/publishers/google...

Card: https://deepmind.google/models/model-cards/gemini-3-1-pro/

793 points | 826 commentspage 10
yuvalmer 18 hours ago|
Gemini 3.0 Pro is bad model for its class. I really hope 3.1 is a leap forward.
1024core 19 hours ago||
It's been hugged to death. I keep getting "Something went wrong".
msavara 20 hours ago||
Somehow doesn't work for me :) "An internal error has occurred"
dude250711 20 hours ago||
I hereby allow you to release models not at the same time as your competitors.
sigmar 20 hours ago|
It is super interesting that this is the same thing that happened in November (ie all labs shipping around the same week 11/12-11/23).
zozbot234 19 hours ago||
They're just throwing a big Chinese New Year celebration.
vintermann 6 hours ago||
Could that actually be connected? There are a LOT of Chinese engineers and researchers working on all these models, I assume they would like to take some vacation days, and it makes sense to me to time releases around it.
PunchTornado 21 hours ago||
The biggest increase is LiveCodeBench Pro: 2887. The rest are in line with Opus 4.6 or slightly better or slightly worse.
shmoogy 20 hours ago|
but is it still terrible at tool calls in actual agentic flows?
Topfi 21 hours ago||
Appears the only difference to 3.0 Pro Preview is Medium reasoning. Model naming has long gone from even trying to make sense, but considering 3.0 is still in preview itself, increasing the number for such a minor change is not a move in the right direction.
GrayShade 21 hours ago||
Maybe that's the only API-visible change, saying nothing about the actual capabilities of the model?
xnx 20 hours ago|||
> increasing the number for such a minor change is not a move in the right direction

A .1 model number increase seems reasonable for more than doubling ARC-AGI 2 score and increasing so many other benchmarks.

What would you have named it?

Topfi 19 hours ago||
My issue is that we haven't even gotten the release version of 3.0, that is also still in Preview, so may stick with 3.0 till that has been deemed stable.

Basically, what does the word "Preview" mean, if newer releases happen before a Preview model is stable? In prior Google models, Preview meant that there'd still be updates and improvements to said model prior to full deployment, something we saw with 2.5. Now, there is no meaning or reason for this designation to exist if they forgo a 3.0 still in Preview for model improvements.

xnx 18 hours ago||
Given the pace AI is improving and that it doesn't give the exact same answers under many circumstances, is the the [in]stability of "preview" a concern?

GMail was in "beta" for 5 years.

Topfi 10 minutes ago|||
Should have clarified initially what I meant by stable, especially because it isn't that known how these terms are defined for Gemini models. Not talking about getting consistent output from a not-deterministic model, but stable from a usage perspective and in the way Google uses the word "stable" to describe their model deployments [0]. "Preview" in regard to Gemini models means a few very specific restrictions including far stricter rate limits and a very tight 14 day deprecation window, making them models one cannot build on.

That is why I'd prefer for them to finish the role out of an existing model before starting work on a dedicated new version.

[0] https://ai.google.dev/gemini-api/docs/models

verdverm 18 hours ago|||
ChatGPT 4.5 was never released to the public, but it is widely believed to be the foundation the 5.x series is built on.

Wonder how GP feels about the minor bumps for other model providers?

Topfi 7 minutes ago||
Minor version bumps are good and I want model providers to communicate changes. The issue I am having is that Gemini "preview" class models have different deprecation timelines and rate limits, making them impossible to rely on for professional use cases. That's why I'd prefer they finish the 3.0 role out prior to putting resources into deploying a second "preview" class model.

For a stable deployment, Google needs a sufficient amount of hardware to guarantee inference and having two Pro models running makes that even more challenging: https://ai.google.dev/gemini-api/docs/models

argsnd 21 hours ago|||
I disagree. Incrementing the minor number makes so much more sense than “gemini-3-pro-preview-1902” or something.
jannyfer 20 hours ago||
According to the blog post, it should be also great at drawing pelicans riding a bicycle.
naiv 20 hours ago||
ok , so they are scared that 5.3 (pro) will be released today/tomorrow and blow it out of the water and rushed it while they could still reference 5.2 benchmarks.
PunchTornado 20 hours ago|
I don't think models blow other models anymore. We have the big 3 which are neck to neck in most benchmarks and the rest. I doubt that 5.3 will blow the others.
scld 20 hours ago||
easy now
andrewstuart 7 hours ago||
Gemini current version drops most of the code every time I try to use it.

Useless.

LZ_Khan 20 hours ago|
biggest problem is that it's slow. also safety seems overtuned at the moment. getting some really silly refusals. everything else is pretty good.
More comments...