Gemini 3 Flash: Frontier intelligence built for speed

Posted by meetpateltech 12/17/2025

1102 points | 580 commentspage 2

caminanteblanco 12/17/2025|

Does anyone else understand what the difference is between Gemini 3 'Thinking' and 'Pro'? Thinking "Solves complex problems" and Pro "Thinks longer for advanced math & code".

I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.

peheje 12/17/2025||

I think:

Fast = Gemini 3 Flash without thinking (or very low thinking budget)

Thinking = Gemini 3 flash with high thinking budget

Pro = Gemini 3 Pro with thinking

sunaookami 12/17/2025||

It's this, yes: https://x.com/joshwoodward/status/2001350002975850520

>Fast = 3 Flash

>Thinking = 3 Flash (with thinking)

>Pro = 3 Pro (with thinking)

caminanteblanco 12/17/2025||

Thank you! I wish they had clearer labelling (or at the very least some documentation) explaining this.

flakiness 12/17/2025|||

It seems:

   - "Thinking" is Gemini 3 Flash with higher "thinking_level"
   - Prop is Gemini 3 Pro. It doesn't mention "thinking_level" but I assume it is set to high-ish.

lysace 12/17/2025||

Really stupid question: How is Gemini-like 'thinking' separate from artificial general intelligence (AGI)?

When I ask Gemini 3 Flash this question, the answer is vague but agency comes up a lot. Gemini thinking is always triggered by a query.

This seems like a higher-level programming issue to me. Turn it into a loop. Keep the context. Those two things make it costly for sure. But does it make it an AGI? Surely Google has tried this?

CamperBob2 12/17/2025|||

I don't think we'll get genuine AGI without long-term memory, specifically in the form of weight adjustment rather than just LoRAs or longer and longer contexts. When the model gets something wrong and we tell it "That's wrong, here's the right answer," it needs to remember that.

Which obviously opens up a can of worms regarding who should have authority to supply the "right answer," but still... lacking the core capability, AGI isn't something we can talk about yet.

LLMs will be a part of AGI, I'm sure, but they are insufficient to get us there on their own. A big step forward but probably far from the last.

bananaflag 12/17/2025||

> When the model gets something wrong and we tell it "That's wrong, here's the right answer," it needs to remember that.

Problem is that when we realize how to do this, we will have each copy of the original model diverge in wildly unexpected ways. Like we have 8 billion different people in this world, we'll have 16 gazillion different AIs. And all of them interacting with each other and remembering all those interactions. This world scares me greatly.

criley2 12/17/2025||||

Advanced reasoning LLM's simulate many parts of AGI and feel really smart, but fall short in many critical ways.

- An AGI wouldn't hallucinate, it would be consistent, reliable and aware of its own limitations

- An AGI wouldn't need extensive re-training, human reinforced training, model updates. It would be capable of true self-learning / self-training in real time.

- An AGI would demonstrate real genuine understanding and mental modeling, not pattern matching over correlations

- It would demonstrate agency and motivation, not be purely reactive to prompting

- It would have persistent integrated memory. LLM's are stateless and driven by the current context.

- It should even demonstrate consciousness.

And more. I agree that what've we've designed is truly impressive and simulates intelligence at a really high level. But true AGI is far more advanced.

waffletower 12/17/2025|||

Humans can fail at some of these qualifications, often without guile: - being consistent and knowing their limitations - people do not universally demonstrate effective understanding and mental modeling.

I don't believe the "consciousness" qualification is at all appropriate, as I would argue that it is a projection of the human machine's experience onto an entirely different machine with a substantially different existential topology -- relationship to time and sensorium. I don't think artificial general intelligence is a binary label which is applied if a machine rigidly simulates human agency, memory, and sensing.

versteegen 12/18/2025||||

> - It should even demonstrate consciousness.

I disagreed with most of your assertions even before I hit the last point. This is just about the most extreme thing you could ask for. I think very few AI researchers would agree with this definition of AGI.

lysace 12/17/2025|||

Thanks for humoring my stupid question with a great answer. I was kind of hoping for something like this :).

dcre 12/17/2025||||

This is what every agentic coding tool does. You can try it yourself right now with the Gemini CLI, OpenCode, or 20 other tools.

andai 12/17/2025|||

AGI is hard but we can solve most tasks with artificial stupidity in an `until done`.

lysace 12/18/2025||

Just a matter of time and cost. Eventually...

xpil 12/17/2025||

My main issue with Gemini is that business accounts can't delete individual conversations. You can only enable or disable Gemini, or set a retention period (3 months minimum), but there's no way to delete specific chats. I'm a paying customer, prices keep going up, and yet this very basic feature is still missing.

testfrequency 12/17/2025||

This is the #1 thing that keeps me from going all in on Gemini.

Their retention controls for both consumer and business suck. It’s the worst of any of the leaders.

strstr 12/18/2025|||

For my personal usage of ai-studio, I had to use autohotkey to record and replay my mouse deleting my old chats. I thought about cooking up a browser extension, but never got around to it.

ComputerGuru 12/18/2025||

Use it over api.

outside2344 12/17/2025||

I don't want to say OpenAI is toast for general chat AI, but it sure looks like they are toast.

Gigachad 12/17/2025||

I’ve fully switched over to Gemini now. It seems significantly more useful, and is less of an automatic glaze machine that just restates your question and how smart you are for asking it.

radicality 12/17/2025|||

How do I get Gemini to be more proactive in finding/double-checking itself against new world information and doing searches?

For that reason I still find chatgpt way better for me, many things I ask it first goes off to do online research and has up to date information - which is surprising as you would expect Google to be way better at this. For example, was asking Gemini 3 Pro recently about how to do something with a “RTX 6000 Blackwell 96GB” card, and it told me this card doesn’t exist and that I probably meant the rtx 6000 ada… Or just today I asked about something on macOS 26.2, and it told me to be cautious as it’s a beta release (it’s not). Whereas with chatgpt I trust the final output more since it very often goes to find live sources and info.

leemoore 12/18/2025|||

Gemini is bad at this sort of thing but I find all models tend to do this to some degree. You have to know this could be coming and give it indicators to assume that it’s training data is going to be out of date. And it must web search the latest as of today or this month. They aren’t taught to ask themselves “is my understanding of this topic based on info that is likely out of date” but understand after the fact. I usually just get annoyed and low key condescend to it for assuming its old ass training data is sufficient grounding for correcting me.

That epistemic calibration is is something they are capable of thinking through if you point it out. But they aren’t trained to stop and ask/check themselves on how confident do they have a right to be. This is a meta cognitive interrupt that is socialized into girls between 6 and 9 and is socialized into boys between 11-13. While meta cognitive interrupt to calibrate to appropriate confidence levels of knowledge is a cognitive skill that models aren’t taught and humans learn socially by pissing off other humans. It’s why we get pissed off st models when they correct ua with old bad data. Our anger is the training tool to stop doing that. Just that they can’t take in that training signal at inference time

andai 12/18/2025|||

Yeah any time I mention GPT-5, the other models start having panic attacks and correcting it to GPT-4. Even if it's a model name in source code!

They think GPT-5 won't be released until the distant future, but what they don't realize is we have already arrived ;)

niek_pas 12/18/2025|||

That’s funny, I’ve had the exact opposite experience. Gemini starts every answer to a coding question with, “you have hit upon a fundamental insight in zyx”. ChatGPT usually starts with, “the short answer? Xyz.”

scrollop 12/17/2025|||

Looking at this they are:

https://artificialanalysis.ai/evaluations/omniscience

https://youtu.be/4p73Uu_jZ10?si=x1gZopegCacznUDA&t=582

pawelduda 12/18/2025||

They have been for a while. Had first mover advantage that kept them in the lead but it's not anything others couldn't throw money at, and catch up eventually. I remember when not so long ago everyone was talking how Google lost AI race, and now it feels like they're chasing Anthropic

SyrupThinker 12/17/2025||

I wonder if this suffers from the same issue as 3 Pro, that it frequently "thinks" for a long time about date incongruity, insisting that it is 2024, and that information it receives must be incorrect or hypothetical.

Just avoiding/fixing that would probably speed up a good chunk of my own queries.

robrenaud 12/17/2025||

Omg, it was so frustrating to say:

Summarize recent working arxiv url

And then it tells me the date is from the future and it simply refuses to fetch the URL.

zhyder 12/17/2025||

Glad to see big improvement in the SimpleQA Verified benchmark (28->69%), which is meant to measure factuality (built-in, i.e. without adding grounding resources). That's one benchmark where all models seemed to have low scores until recently. Can't wait to see a model go over 90%... then will be years till the competition is over number of 9s in such a factuality benchmark, but that'd be glorious.

jug 12/18/2025|

Yes, that's very good because it's my main use case for Flash; queries depending on world knowledge. Not science or engineering problems, but think you'd ask someone that has a really broad knowledge about things and can give quick and straightforward answers.

primaprashant 12/17/2025||

Pricing is $0.5 / $3 per million input / output tokens. 2.5 Flash was $0.3 / $2.5. That's 66% increase in input tokens and 20% increase in output token pricing.

For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.

simonw 12/17/2025||

Calculating price increases is made more complex by the difference in token usage. From https://blog.google/products/gemini/gemini-3-flash/ :

> Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.

Tiberium 12/17/2025||

Yes, but also most of the increase in 3 Flash is in the input context price, which isn't affected by reasoning.

int_19h 12/17/2025||

It is affected if it has to round-trip, e.g. because it's making tool calls.

prvc 12/18/2025||

Apples to oranges.

meetpateltech 12/17/2025||

Deepmind Page: https://deepmind.google/models/gemini/flash/

Developer Blog: https://blog.google/technology/developers/build-with-gemini-...

Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/

Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...

simonw 12/17/2025||

For anyone from the Gemini team reading this: these links should all be prominent in the announcement posts. I always have to hunt around for them!

meetpateltech 12/17/2025||

Google actually does something similar for major releases - they publish a dedicated collection page with all related links.

For example, the Gemini 3 Pro collection: https://blog.google/products/gemini/gemini-3-collection/

But having everything linked at the bottom of the announcement post itself would be really great too!

simonw 12/17/2025||

Sadly there's nothing about Gemini 3 Flash on that page yet.

minimaxir 12/17/2025||

Documentation for Gemini 3 Flash in particular: https://ai.google.dev/gemini-api/docs/gemini-3

zurfer 12/17/2025||

It's a cool release, but if someone on the google team reads that: flash 2.5 is awesome in terms of latency and total response time without reasoning. In quick tests this model seems to be 2x slower. So for certain use cases like quick one-token classification flash 2.5 is still the better model. Please don't stop optimizing for that!

edvinasbartkus 12/17/2025||

Did you try setting thinkingLevel to minimal?

thinkingConfig: { thinkingLevel: "low", }

More about it here https://ai.google.dev/gemini-api/docs/gemini-3#new_api_featu...

zurfer 12/17/2025||

Yes I tried it with minimal and it's roughly 3 seconds for prompts that take flash 2.5 1 second.

On that note it would be nice to get these benchmark numbers based on the different reasoning settings.

retropragma 12/17/2025|||

That's more of a flash-lite thing now, I believe

Tiberium 12/17/2025|||

You can still set thinking budget to 0 to completely disable reasoning, or set thinking level to minimal or low.

andai 12/18/2025||

>You cannot disable thinking for Gemini 3 Pro. Gemini 3 Flash also does not support full thinking-off, but the minimal setting means the model likely will not think (though it still potentially can). If you don't specify a thinking level, Gemini will use the Gemini 3 models' default dynamic thinking level, "high".

https://ai.google.dev/gemini-api/docs/thinking#levels

Tiberium 12/18/2025||

I was talking about Gemini 3 Flash, and you absolutely can disable reasoning, just try sending thinking budget: 0. It's strange that they don't want to mention this, but it works.

andai 12/18/2025||

Gemini 3 Flash is in the second sentence.

throwaway127482 12/18/2025||

See, this is what happens when you turn off thinking completely.

bobviolier 12/17/2025||

This might also have to do with it being a preview, and only available on the global region?

rohitpaulk 12/17/2025||

Wild how this beats 2.5 Pro in every single benchmark. Don't think this was true for Haiku 4.5 vs Sonnet 3.5.

FergusArgyll 12/17/2025|

Sonnet 3.5 might have been better than opus 3. That's my recollection anyhow

hubraumhugo 12/17/2025|

You can get your HN profile analyzed and roasted by it. It's pretty funny :) https://hn-wrapped.kadoa.com

onraglanroad 12/17/2025||

I didn't feel roasted at all. In fact I feel vindicated! https://hn-wrapped.kadoa.com/onraglanroad

echelon 12/17/2025|||

This is hilarious. The personalized pie charts and XKCD-style comics are great, and the roast-style humor is perfect.

I do feel like it's not an entirely accurate caricature (recency bias? limited context?), but it's close enough.

Good work!

You should do a "show HN" if you're not worried about it costing you too much.

SubiculumCode 12/17/2025|||

dang https://hn-wrapped.kadoa.com/dang

WhereIsTheTruth 12/17/2025|||

This is exactly why you keep your personal life off the internet

apparent 12/17/2025|||

Pretty fucking hilarious, if completely off-topic.

knicholes 12/18/2025|||

That cut deep

peheje 12/17/2025||

This is great. I literally "LOL'd".

More comments...