Posted by meetpateltech 6 days ago
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
Deepmind Page: https://deepmind.google/models/gemini/flash/
I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.
Fast = Gemini 3 Flash without thinking (or very low thinking budget)
Thinking = Gemini 3 flash with high thinking budget
Pro = Gemini 3 Pro with thinking
>Fast = 3 Flash
>Thinking = 3 Flash (with thinking)
>Pro = 3 Pro (with thinking)
- "Thinking" is Gemini 3 Flash with higher "thinking_level"
- Prop is Gemini 3 Pro. It doesn't mention "thinking_level" but I assume it is set to high-ish.When I ask Gemini 3 Flash this question, the answer is vague but agency comes up a lot. Gemini thinking is always triggered by a query.
This seems like a higher-level programming issue to me. Turn it into a loop. Keep the context. Those two things make it costly for sure. But does it make it an AGI? Surely Google has tried this?
Which obviously opens up a can of worms regarding who should have authority to supply the "right answer," but still... lacking the core capability, AGI isn't something we can talk about yet.
LLMs will be a part of AGI, I'm sure, but they are insufficient to get us there on their own. A big step forward but probably far from the last.
Problem is that when we realize how to do this, we will have each copy of the original model diverge in wildly unexpected ways. Like we have 8 billion different people in this world, we'll have 16 gazillion different AIs. And all of them interacting with each other and remembering all those interactions. This world scares me greatly.
- An AGI wouldn't hallucinate, it would be consistent, reliable and aware of its own limitations
- An AGI wouldn't need extensive re-training, human reinforced training, model updates. It would be capable of true self-learning / self-training in real time.
- An AGI would demonstrate real genuine understanding and mental modeling, not pattern matching over correlations
- It would demonstrate agency and motivation, not be purely reactive to prompting
- It would have persistent integrated memory. LLM's are stateless and driven by the current context.
- It should even demonstrate consciousness.
And more. I agree that what've we've designed is truly impressive and simulates intelligence at a really high level. But true AGI is far more advanced.
I don't believe the "consciousness" qualification is at all appropriate, as I would argue that it is a projection of the human machine's experience onto an entirely different machine with a substantially different existential topology -- relationship to time and sensorium. I don't think artificial general intelligence is a binary label which is applied if a machine rigidly simulates human agency, memory, and sensing.
I disagreed with most of your assertions even before I hit the last point. This is just about the most extreme thing you could ask for. I think very few AI researchers would agree with this definition of AGI.
Their retention controls for both consumer and business suck. It’s the worst of any of the leaders.
For that reason I still find chatgpt way better for me, many things I ask it first goes off to do online research and has up to date information - which is surprising as you would expect Google to be way better at this. For example, was asking Gemini 3 Pro recently about how to do something with a “RTX 6000 Blackwell 96GB” card, and it told me this card doesn’t exist and that I probably meant the rtx 6000 ada… Or just today I asked about something on macOS 26.2, and it told me to be cautious as it’s a beta release (it’s not). Whereas with chatgpt I trust the final output more since it very often goes to find live sources and info.
That epistemic calibration is is something they are capable of thinking through if you point it out. But they aren’t trained to stop and ask/check themselves on how confident do they have a right to be. This is a meta cognitive interrupt that is socialized into girls between 6 and 9 and is socialized into boys between 11-13. While meta cognitive interrupt to calibrate to appropriate confidence levels of knowledge is a cognitive skill that models aren’t taught and humans learn socially by pissing off other humans. It’s why we get pissed off st models when they correct ua with old bad data. Our anger is the training tool to stop doing that. Just that they can’t take in that training signal at inference time
They think GPT-5 won't be released until the distant future, but what they don't realize is we have already arrived ;)
Just avoiding/fixing that would probably speed up a good chunk of my own queries.
Summarize recent working arxiv url
And then it tells me the date is from the future and it simply refuses to fetch the URL.
For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.
> Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
For example, the Gemini 3 Pro collection: https://blog.google/products/gemini/gemini-3-collection/
But having everything linked at the bottom of the announcement post itself would be really great too!
thinkingConfig: { thinkingLevel: "low", }
More about it here https://ai.google.dev/gemini-api/docs/gemini-3#new_api_featu...
On that note it would be nice to get these benchmark numbers based on the different reasoning settings.
I do feel like it's not an entirely accurate caricature (recency bias? limited context?), but it's close enough.
Good work!
You should do a "show HN" if you're not worried about it costing you too much.