Gemini 3 Flash: Frontier intelligence built for speed

Posted by meetpateltech 12/17/2025

1102 points | 580 commentspage 3

jaigupta 12/18/2025|

Only if I could figure out how to use it. I have been using Claude Code and enjoy it. I sometimes also try Codex which is also not bad.

Trying to use Gemini cli is such a pain. I bought GDP Premium and configured GCP, setup environment variables, enabled preview features in cli and did all the dance around it and it won't let me use gemini 3. Why the hell I am even trying so hard?

jdanbrown 12/18/2025||

Have you tried OpenRouter (https://openrouter.ai)? I’ve been happy using it as a unified api provider with great model coverage (including Google, Anthropic, OpenAI, Grok, and the major open models). They charge 5% on top of each model’s api costs, but I think it’s worth it to have one centralized place to insert my money and monitor my usage. I like being able to switch out models without having to change my tools, and I like being able to easily head-to-head compare claude/gemini/gpt when I get stuck on a tricky problem.

Then you just have to find a coding tool that works with OpenRouter. Afaik claude/codex/cursor don’t, at least not without weird hacks, but various of the OSS tools do — cline, roo code, opencode, etc. I recently started using opencode (https://github.com/sst/opencode), which is like an open version of claude code, and I’ve been quite happy with it. It’s a newer project so There Will Be Bugs, but the devs are very active and responsive to issues and PRs.

Palmik 12/18/2025|||

Why would you use OpenRouter rather than some local proxy like LiteLLM? I don't see the point of sharing data with more third parties and paying for the privilege.

Not to mention that for coding, it's usually more cost efficient to get whatever subscription the specific model provider offers.

ygouzerh 12/18/2025||

Thanks, I didn't knew about LiteLLM!

OpenRouter have some interesting providers, like Cerebras, which delivers 2,300 token/s on gpt-oss

jaigupta 12/18/2025|||

I have used OpenRouter before but in this case I was trying to use it like Claude Code (agentic coding with a simple fixed monthly subscription). I don't want to pay per use via direct APIs as I am afraid it might have surprising bills. My point was, why Google makes it so damn hard even for paid subscriptions where it was supposed to work.

qingcharles 12/18/2025|||

Have you tried Google Antigravity? I use that and GitHub Copilot when I want to use Gemini for coding tasks.

android521 12/18/2025||

use cursor. it allows you to choose any model to use.

tootyskooty 12/17/2025||

Since it now includes 4 thinking levels (minimal-high) I'd really appreciate if we got some benchmarks across the whole sweep (and not just what's presumably high).

Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?

happyopossum 12/17/2025|

Google appears to be changing what flash is “meant for” with this release - the capability it has along with the thinking budgets make it superior to previous Pro models in both outcome and speed. The likely-soon-coming flash-lite will fit right in to where flash used to be - cheap and fast.

jug 12/17/2025||

Looks like a good workhorse model, like I felt 2.5 Flash also was at its time of launch. I hope I can build confidence with it because it'll be good to offload Pro costs/limits as well of course always nice with speed for more basic coding or queries. I'm impressed and curious about the recent extreme gains on ARC-AGI-2 from 3 Pro, GPT-5.1 and now even 3 Flash.

cakealert 12/18/2025||

Gemini 2.5 was a full broadside on OpenAI's ship.

After Gemini 3.0 the OpenAI damage control crews all drowned.

Not only is it vastly better, it's also free.

I find this particular benchmark to be in agreement with my experiences: https://simple-bench.com

whinvik 12/17/2025||

Ok, I was a bit addicted to Opus 4.5 and was starting to feel like there's nothing like it.

Turns out Gemini 3 Flash is pretty close. The Gemini CLI is not as good but the model more than makes up for it.

The weird part is Gemini 3 Pro is nowhere as good an experience. Maybe because its just so slow.

scrollop 12/17/2025||

Yes! Gemini 3 pro is significantly slower than opus (surprisingly) , and prefer opus' output.

Might be using flash for my MCP research/transcriber/minor tasks modl over haiku, now, though (will test of course)

__jl__ 12/17/2025||

I will have to try that. Cursor bill got pretty high with Opus 4.5. Never considered opus before the 4.5 price drop but now it's hard to change... :)

diamondfist25 12/17/2025||

$100 Claude max is the best subscription I’ve ever had.

Well worth every penny now

vanviegen 12/18/2025||

Or a $40 GitHub copilot plan also gets you a lot of Opus usage.

onoesworkacct 12/18/2025||

Missing a lot without claude code tho

vanviegen 12/21/2025||

I've tried both, and I'm still not sure. Claude Code steers more towards a hands-off, vibe coding approach, which I often regret later. With Copilot I'm more involved, which feels less 'magical' and takes me more time, but generally does not end in misery.

Obertr 12/17/2025||

At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed

Image model they have released is much worse than nano banana pro, ghibli moment did not happen

Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding

The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.

avazhi 12/17/2025||

"At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed"

This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.

drawnwren 12/17/2025|||

OAI also got talent mined. Their top intellectual leaders left after fight with sama, then Meta took a bunch of their mid-senior talent, and Google had the opposite. They brought Noam and Sergey back.

mmaunder 12/17/2025|||

Yeah the only thing standing in Google's way is Google. And it's the easy stuff, like sensible billing models, easy to use docs and consoles that make sense and don't require 20 hours to learn/navigate, and then just the slew of bugs in Gemini CLI that are basic usability and model API interaction things. The only differentiator that OpenAI still has is polish.

Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.

Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.

Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.

Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.

mips_avatar 12/17/2025|||

I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.

vanviegen 12/18/2025|||

Openrouter is great! Prepaid, no surprise bills. Easily switch between any models you desire. Dead simple interface. Reliable. What's not to like?

metadat 12/18/2025||

With OpenRouter it can be unclear if you're getting a quantized model or not.

mmaunder 12/17/2025||||

Agreed. It's ridiculous.

visarga 12/17/2025|||

I do. Gave up using Gemini directly.

mips_avatar 12/17/2025||

I mean I do too, had a really odd Gemini bug until I did byok on openrouter

ewoodrich 12/18/2025|||

Gemini CLI via a Google One plan is the regular consumer billing flow which is pretty straightforward.

GenerWork 12/17/2025|||

I'm actually liking 5.2 in Codex. It's able to take my instructions, do a good job at planning out the implementation, and will ask me relevant questions around interactions and functionality. It also gives me more tokens than Claude for the same price. Now, I'm trying to white label something that I made in Figma so my use case is a lot different from the average person on this site, but so far it's my go to and I don't see any reason at this time to switch.

gpt5 12/17/2025||

I've noticed when it comes to evaluating AI models, most people simply don't ask difficult enough questions. So everything is good enough, and the preference comes down to speed and style.

It's when it becomes difficult, like in the coding case that you mentioned, that we can see the OpenAI still has the lead. The same is true for the image model, prompt adherence is significantly better than Nano Banana. Especially at more complex queries.

int_19h 12/17/2025|||

I'm currently working on a Lojban parser written in Haskell. This is a fairly complex task that requires a lot of reasoning. And I tried out all the SOTA agents extensively to see which one works the best. And Opus 4.5 is running circles around GPT-5.2 for this. So no, I don't think it's true that OpenAI "still has the lead" in general. Just in some specific tasks.

fellowniusmonk 12/17/2025||||

I have a very complex set of logic puzzles I run through my own tests.

My logic test and trying to get an agent to develop a certain type of ** implementation (that is published and thus the model is trained on to some limited extent) really stress test models, 5.2 is a complete failure of overfitting.

Really really bad in an unrecoverable infinite loop way.

It helps when you have existing working code that you know a model can't be trained on.

It doesn't actually evaluate the working code it just assumes it's wrong and starts trying to re-write it as a different type of **.

Even linking it to the explanation and the git repo of the reference implementation it still persists in trying to force a different **.

This is the worst model since pre o3. Just terrible.

GenerWork 12/17/2025|||

I'd argue that 5.2 just barely squeaks past Sonnet 4.5 at this point. Before this was released, 4.5 absolutely beat Codex 5.1 Medium and could pretty much oneshot UI items as long as I didn't try to create too many new things at once.

int32_64 12/17/2025|||

Is there a "good enough" endgame for LLMs and AI where benchmarks stop mattering because end users don't notice or care? In such a scenario brand would matter more than the best tech, and OpenAI is way out in front in brand recognition.

crazygringo 12/17/2025|||

For average consumers, I think very much yes, and this is where OpenAI's brand recognition shines.

But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.

And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)

smashed 12/17/2025|||

I haven't seen any LLM tech shine "where every detail matters".

In fact so far, they consistently fail in exactly these scenario, glossing over random important details whenever you double check results in depth.

You might have found models, prompts or workflows that work for you though, I'm interested.

bitpush 12/17/2025||||

> OpenAI's brand recognition shines.

We've seen this movie before. Snapchat was the darling. Infact, it invented the entire category and was dominating the format for years. Then it ran out of time.

Now very few people use Snapchat, and it has been reduced to a footnote in history.

If you think I'm exaggerating, that just proves my point.

decimalenough 12/17/2025||

Not a great example: Snapchat made it through the slump, successfully captured the next generation of teenagers, and now has around 500M DAUs.

bitpush 12/17/2025||

You might not remember, but Snapchat was once supposed to take on Facebook. The founder was so cocky that they declined being bought by Facebook because they thought they could be bigger.

I never said Snapchat is dead. It still lives on, but it is a shell of the past. They had no moat, and the competitors caught up (Instagram, Whatsapp and even LinkedIn copied Snapchat with stories .. and rest is history)

xbmcuser 12/17/2025||||

Google biggest advantage over time will be costs. They have their own hardware which they can and will optimise for their LLMS. And Google has experience of getting market share over time by giving better results, performance or space. ie gmail vs hotmail/yahoo. Chrome vs IE/Firefox. So don't discount them if the quality is better they will get ahead over time.

int_19h 12/17/2025||

It already is costs. Their Pro plan has much more generous limits compared to both OpenAI and especially Anthropic. You get 20 Deep Research queries with Pro per day, for example.

rfw300 12/17/2025||||

That might be true for a narrow definition of chatbots, but they aren't going to survive on name recognition if their models are inferior in the medium term. Right now, "agents" are only really useful for coding, but when they start to be adopted for more mainstream tasks, people will migrate to the tools that actually work first.

holler 12/17/2025||||

this. I don't know any non-tech people who use anything other than chatgpt. On a similar note, I've wondered why Amazon doesn't make a chatgpt-like app with their latest Alexa+ makeover, seems like a missed opportunity. The Alexa app has a feature to talk to the LLM in chat mode, but the overall app is geared towards managing devices.

macNchz 12/17/2025|||

Google has great distribution to be able to just put Gemini in front of people who are already using their many other popular services. ChatGPT definitely came out of the gate with a big lead on name recognition, but I have been surprised to hear various non-techy friends talking about using Gemini recently, I think for many of them just because they have access at work through their Workspace accounts.

Obertr 12/17/2025||||

Most of Europe if full of Gemini ads, my parents use Gemini because it is free and it popped up in YouTube ad before the video

Just go outside the bubble plus take a bit older people

ewoodrich 12/18/2025||

Yeah my parents never really cared enough to explore ChatGPT despite hearing about it 10 times a day in news/media for the last few years. But recently my mom started using Google's AI Search mode after first trying it while doing research for house hunting and my dad uses the Gemini app for occasional questions/identifying parts and stuff (he has always loved Google Lens so those sort of interactive multimedia features are the main pull vs plain text chatbot conversations).

They are both Android/Google Search users so all it really took was "sure I guess I'll try that" in response to a nudge from Google. For me personally I have subscriptions to Claude/ChatGPT/Gemini for coding but use Gemini for 90% of chatbot questions. Eventually I'll cancel some of them but will probably keep Gemini regardless because I like having the extra storage with my Google One plan bundle. Google having a pre-existing platform/ecosystem is a huge advantage imo.

nimchimpsky 12/17/2025|||

[dead]

fullstick 12/17/2025||||

I doubt anyone I know who is using llms outside of work knows that there are benchmark tests for these models.

jay_kyburz 12/17/2025|||

This is why both google and microsoft are pushing Gemini and Copilot in everyone's face.

dieortin 12/17/2025|||

Is there anything pointing to Brin having anything to do with Google’s turnaround in AI? I hear a lot of people saying this, but no one explaining why they do

novok 12/17/2025|||

In organizations, everyone's existence and position is politically supported by their internal peers around their level. Even google's & microsoft's current CEOs are supported by their group of co-executives and other key players. The fact that both have agreeable personalities is not a mistake! They both need to keep that balance to stay in power, and that means not destroying or disrupting your peer's current positions. Everything is effectively decided by informal committee.

Founders are special, because they are not beholden to this social support network to stay in power and founders have a mythos that socially supports their actions beyond their pure power position. The only others they are beholden too are their co-founders, and in some cases major investor groups. This gives them the ability to disregard this social balance because they are not dependent on it to stay on power. Their power source is external to the organization, while everyone else is internal to it.

This gives them a very special "do something" ability that nobody else has. It can lead to failures (zuck & occulus, snapchat spectacles) or successes (steve jobs, gemini AI), but either way, it allows them to actually "do something".

JumpCrisscross 12/17/2025||

> Founders are special, because they are not beholden to this social support network to stay in power

Of course they are. Founders get fired all the time. As often as non-founder CEOs purge competition from their peers.

> The only others they are beholden too are their co-founders, and in some cases major investor groups

This describes very few successful executives. You can have your co-founders and investors on board, if your talent and customers hate you, they’ll fuck off.

HarHarVeryFunny 12/17/2025||||

I would say it more goes back to the Google Brain + DeepMind merger, creating Google DeepMind headed by Demis Hassabis.

The merger happened in April 2023.

Gemini 1.0 was released in Dec 2023, and the progress since then has been rapid and impressive.

ryoshu 12/17/2025|||

If he's having an impact it's because he can break through the bureaucracy. He's not trying to protect a fiefdom.

raincole 12/17/2025|||

That's a quite sensationalized view.

Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?

Obertr 12/17/2025|||

Check the size and budget of Google iniatives. It’s unlimited

akie 12/18/2025||

Google basically has unlimited budget and unlimited data. If they're ahead now, which I believe they are, they'll be very very difficult to catch.

BoredPositron 12/17/2025|||

The Ghibli moment was an influencer fad not real advancement.

baq 12/17/2025|||

GPT 5.2 is actually getting me better outputs than Opus 4.5 on very complex reviews (on high, I never use less) - but the speed makes Opus the default for 95% of use cases.

yieldcrv 12/17/2025|||

the trend I've seen is that none of these companies are behind in concept and theory, they are just spending longer intervals baking a more superior foundational model

so they get lapped a few times and then drop a fantastic new model out of nowhere

the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc

they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options

JumpCrisscross 12/17/2025|||

> I start to believe OAI is very much behind

Kara Swisher recently compared OpenAI to Netscape.

Andrex 12/18/2025||

Ouch.

Maybe we'll get some awesome FOSS tech out of its ashes?

JumpCrisscross 12/18/2025||

We’ll get a bail-out and then a massive data-centre and energy-production build-out.

louiereederson 12/17/2025|||

i think the most important part of google vs openai is slowing usage of consumer LLMs. people focus on gemini's growth, but overall LLM MAUs and time spent is stabilizing. in aggregate it looks like a complete s-curve. you can kind of see it in the table in the link below but more obvious when you have the sensortower data for both MAUs and time spent.

the reason this matters is slowing velocity raises the risk of featurization, which undermines LLMs as a category in consumer. cost efficiency of the flash models reinforces this as google can embed LLM functionality into search (noting search-like is probably 50% of chatgpt usage per their july user study). i think model capability was saturated for the average consumer use case months ago, if not longer, so distribution is really what matters, and search dwarfs LLMs in this respect.

https://techcrunch.com/2025/12/05/chatgpts-user-growth-has-s...

aswegs8 12/18/2025|||

Not sure why they just not replicate the workflow that nano banana pro uses. It lets the thinking model generate a detailed description and then renders that image. When I use ChatGPT thinking model and render an image I also get pretty good results. It's not as creative or flexible as nano banana pro, but it produces really useful results.

random9749832 12/17/2025|||

This is obviously trained on Pro 3 outputs for benchmaxxing.

CuriouslyC 12/17/2025|||

Not trained on pro, distilled from it.

viraptor 12/17/2025||

What do you think distilled means...?

CuriouslyC 12/17/2025||

It's good to keep the language clear, because you could pretrain/sft on outputs (as many labs do), which is not the same thing.

NitpickLawyer 12/17/2025|||

> for benchmaxxing.

Out of all the big4 labs, google is the last I'd suspect of benchmaxxing. Their models have generally underbenched and overdelivered in real world tasks, for me, ever since 2.5 pro came out.

encroach 12/17/2025|||

OAI's latest image model outperforms Google's in LMArena in both image generation and image editing. So even though some people may prefer nano banana pro in their own anecdotal tests, the average person prefers GPT image 1.5 in blind evaluations.

https://lmarena.ai/leaderboard/text-to-image

https://lmarena.ai/leaderboard/image-edit

Obertr 12/17/2025||

Add This to Gemini distribution which is being adcertised by Google in all of their products, and average Joe will pick the sneakers at the shelf near the checkout rather than healthier option in the back

gdhkgdhkvff 12/17/2025|||

Those darn sneakers are just too delicious!

encroach 12/17/2025||||

That's not how the arena works. The evaluation is blind so Google's advertising/integration has no effect on the results.

Obertr 12/17/2025||

3 points, sure

encroach 12/17/2025||

Right, it only scores 3 points higher on image edit, which is within the margin of error. But on image generation, it scores a significant 29 points higher.

raincole 12/17/2025|||

...and what does this have to do with the comment you replied to? Did you reply to the wrong person or you were just stating unrelated factoids?

nightski 12/17/2025||

Google has incredible tech. The problem is and always has been their products. Not only are they generally designed to be anti-consumer, but they go out of their way to make it as hard as possible. The debacle with Antigravity exfiltrating data is just one of countless.

novok 12/17/2025||

The Antigravity case feels like a pure bug and them rushing to market. They had a bunch of other bugs showing that. That is not anti-consumer or making it difficult.

acheong08 12/17/2025||

Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would be able to control a robot with raw instructions and perform skilled physical work currently limited by the text-only output of LLMs. Also helps that the Gemini series is really good at multimodal processing with images and audio. Maybe they can also encode sensory inputs in a similar way.

Pipe dream right now, but 50 years later? Maybe

incognito124 12/17/2025||

Believe it or not, there's Gemini Robotics, which seems to be exactly what you're talking about:

https://deepmind.google/models/gemini-robotics/

Previous discussions: https://news.ycombinator.com/item?id=43344082

iamgopal 12/17/2025||

Much sooner, hardware, power, software, even AI model design, inference hardware, cache, everything being improved , it's exponential.

bearjaws 12/17/2025||

I've been using the preview flash model exclusively since it came out, the speed and quality of response is all I need at the moment. Although still using Claude Code w/ Opus 4.5 for dev work.

Google keeps their models very "fresh" and I tend to get more correct answers when asking about Azure or O365 issues, ironically copilot will talk about now deleted or deprecated features more often.

sv123 12/17/2025|

I've found copilot within the Azure portal to be basically useless for solving most problems.

djeastm 12/17/2025||

Me too. I don't understand why companies think we devs need a custom chat on their website when we all have access to a chat with much smarter models open in a different tab.

golem14 12/17/2025||

That's not what they are thinking. They are thinking: "We want to capture the dev and make them use our model – since it is easier to use it in our tab, it can afford to be inferior. This way we get lots of tasty, tasty user data."

GaggiX 12/17/2025|

They went too far, now the Flash model is competing with their Pro version. Better SWE-bench, better ARC-AGI 2 than 3.0 Pro. I imagine they are going to improve 3.0 Pro before it's no more in Preview.

Also I don't see it written in the blog post but Flash supports more granular settings for reasoning: minimal, low, medium, high (like openai models), while pro is only low and high.

minimaxir 12/17/2025||

"minimal" is a bit weird.

> Matches the “no thinking” setting for most queries. The model may think very minimally for complex coding tasks. Minimizes latency for chat or high throughput applications.

I'd prefer a hard "no thinking" rule than what this is.

GaggiX 12/17/2025||

It still supports the legacy mode of setting the budget, you can set it to 0 and it would be equivalent to none reasoning effort like gpt 5.1/5.2

minimaxir 12/17/2025||

I can confirm this is the case via the API, but annoyingly AI Studio doesn't let you do so.

skerit 12/17/2025|||

> They went too far, now the Flash model is competing with their Pro version

Wasn't this the case with the 2.5 Flash models too? I remember being very confused at that time.

JohnnyMarcone 12/17/2025||

This is similar to how Anthropic has treated sonnet/opus as well. At least pre opus 4.5.

To me it seems like the big model has been "look what we can do", and the smaller model is "actually use this one though".

jug 12/17/2025||

I'm not sure how I'm going to live with this!

More comments...