Gemini 3 Flash: Frontier intelligence built for speed

Posted by meetpateltech 12/17/2025

1102 points | 580 commentspage 6

elvin_d 12/17/2025|

Gemini 3 are great models but lacking a few things: - app expirience is atrocious, poor UX all over the place. A few examples: silly jumps when reading the text when the model starting to respond, slide-over view in iPad breaking request while Claude and ChatGPT working fine. - Google offer 2 choices: your data used for whatever they want or if you want privacy, the app expirience going even worse.

zkmon 12/18/2025||

I asked it to draft an email with a business proposal and it puts the date on letter as October 26, 2023. Then I asked it why it did so. It replies saying that the templates it was trained on might be anchored to that date. Gemini 3 Pro also puts that same date on letter. I didn't ask it why.

muixoozie 12/18/2025||

>ask it why

Always cracks me up asking the LLM why it said something like it really knows and won't just make up something plausible.

Scary thing is how similar we are in this regard. People confabulate and rationalize things the time, but it's especially apparent in people who engage in denial of illness (anosognosia) due to brain damage. One well documented example is stroke damaging the right hemisphere of the brain and paralyzing the left side of the body. Some will deny their paralyzed arm is paralyzed; Make up all sorts of excuses if cross examined / confronted with evidence of illness [0], or practically hallucinate their arm working, fail to notice it's not working etc. Video goes into like half a dozen experiments least. Mini spoiler: can ask someone with similar brain damage a ridiculous question "why did you just do x" (when did didn't do anything) and they'll confabulate an answer. Reminds me of split brain patients videos rationalizing why they did something (speaking left side of the brain) that was communicated visually only to the right hemisphere. [1].

Anyways, I was rewatching the anosognosia video the other day for the first time in like a decade and it really made me wonder how many evolutionary brain specializations it would take to more closely mimic human behavior in a machine.

- 0; https://www.youtube.com/watch?v=MDHJDKPeB2A - 1: https://www.youtube.com/watch?v=lfGwsAdS9Dc&t=347

zkmon 12/18/2025||

https://ibb.co/vvcKBkL6

ofermend 12/18/2025||

Gemini-3-flash is now on Vectara hallucination leaderboard, and rated at 13.5% grounded hallucination rate.

https://github.com/vectara/hallucination-leaderboard

zone411 12/17/2025||

Scores 92.0 on my Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/). Gemini 2.5 Flash scored 25.2, and Gemini 3 Pro scored 96.8.

alooPotato 12/17/2025||

I have a latency sensitive application - anyone know if any tools that let you compare time to first token and total latency for a bunch of models at once given a prompt. Ideally, run close to the DCs that serve the various models so we can take out network latency from the benchmark.

simonw 12/17/2025||

I had it draw four pelicans, one for each of its thinking levels (Gemini 3 Pro only had two thinking levels). Then I had it write me an <image-gallery> Web Component to help display the four pelicans it had made on my blog: https://simonwillison.net/2025/Dec/17/gemini-3-flash/

I also had it summarize this thread on Hacker News about itself:

https://gist.github.com/simonw/b0e3f403bcbd6b6470e7ee0623be6...

  llm \
  -f hn:46301851 -m "gemini-3-flash-preview" \
  -s 'Summarize the themes of the opinions expressed here.
  For each theme, output a markdown header.
  Include direct "quotations" (with author attribution) where appropriate.
  You MUST quote directly from users when crediting them, with double quotes.
  Fix HTML entities. Output markdown. Go long. Include a section of quotes that illustrate opinions uncommon in the rest of the piece'

Where the `-f hn:xxxx` bit resolves via this plugin: https://github.com/simonw/llm-hacker-news

Tiberium 12/17/2025||

Yet again Flash receives a notable price hike: from $0.3/$2.5 for 2.5 Flash to $0.5/$3 (+66.7% input, +20% output) for 3 Flash. Also, as a reminder, 2 Flash used to be $0.1/$0.4.

BeetleB 12/17/2025|

Yes, but this Flash is a lot more powerful - beating Gemini 3 Pro on some benchmarks (and pretty close on others).

I don't view this as a "new Flash" but as "a much cheaper Gemini 3 Pro/GPT-5.2"

Tiberium 12/17/2025|||

I would be less salty if they gave us 3 Flash Lite at same price as 2.5 Flash or cheaper with better capability, but they still focus on the pricier models :(

int_19h 12/17/2025|||

We'll probably get 3 Flash Lite eventually, it just takes time to distill the models, and you want to start with the one that is likely to bring in more money.

zzleeper 12/17/2025|||

Same! I want to do some data stuff from documents and 2.0 pricing was amazing, but the constant increases go the wrong way for this task :/

jexe 12/17/2025|||

Right, depends on your use cases. I was looking forward to the model as an upgrade to 2.5 Flash, but when you're processing hundreds of millions of tokens a day (not hard to do if you're dealing in documents or emails with a few users), the economics fall apart.

inshard 12/18/2025||

Tested it on Gemini CLI and the experience as good if not better than Claude Code. Gemini CLI has come a long way and is arguably likely to surpass Claude Code at this rate of progress.

MillionOClock 12/18/2025|

What are your favorite features? I recently downloaded it and also use Codex CLI and GitHub Copilot in VS Code but I don't really know what specific features it has others might not have.

inshard 12/18/2025||

The UI is better - they box the specific types of actions the orchestrator agent takes with a clear categorization. The standard quality of life shortcuts like type a number to respond to an MCQ are present here as well. They use specialized sub agents such as one with big context window to find context in the codebase. The quotas appear to be much more generous vs CC. The agent memory management between compacting cycles seems to have a few tricks CC is missing. Also, with 3.0 Flash, it feels faster with the same level of agency and intelligence. It has a feature to focus into an interactive shell where bash commands are being executed by the orchestrator agent. Doesn't feel like Google is trying to push you to buy more credits or is relying on this product for its financial survival - I suspect CC has some dark patterns around this where the agents runs cycles of token in circles with minimal progress on bugs before you have to top up your wallet. Early days still.

poplarsol 12/17/2025|

Will be interesting to see what their quota is. Gemini 3.0 Pro only gives you 250 / day until you spam them with enough BS requests to increase your total spend > $250.

More comments...