Top
Best
New

Posted by spectraldrift 11 hours ago

Gemini 3.5 Flash(blog.google)
https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...
643 points | 474 commentspage 2
margorczynski 6 hours ago|
Wow at the price hike. Still I think in the long run the Chinese will win if they're able to produce hardware comparable to Nvidia.
hedora 3 hours ago||
Why would the Chinese sell me nvidia cards? I can just by an AMD iGPU, and the perf/$ is much better than nvidia dGPUs.

(Typed on a 2023 macbook perfectly capable of running the Chinese open weight models.)

650REDHAIR 4 hours ago|||
I've had the $20 Gemini plan to use when my local setup runs into tougher problems and the throttling today has been bonkers. I canceled my subscription and will look into upgrading my local setup.
Culonavirus 3 hours ago|||
Doesn't need to be the Chinese. It can be anyone without stratospheric Nvidia margins. The Gold Rush phase of AI economy (aka "the bubble") is beginning to slow down and the Optimization phase is just beginning to ramp up (we see this with massive bumps to token cost and token burn rate of pretty much all frontier models, plus the general pivot away from your typical individual chat end-users to businesses and employees of said businesses) and there will come a time when "nvidia has the best software stack" will not mean much for the big players. Organically, I think it already kinda does, it's just masked with the inertia of massive circular deals and Nvidia selling its services to itself (entities it backs/invests in).
HDBaseT 5 hours ago||
Aren't China also allowed to purchase Nvidia GPUs now too?
xbmcuser 21 minutes ago|||
Most Chinese companies will avoid Nvidia Gpu and as much american tech they can now when it comes to serving AI as now they know it can be stopped any time by the US or maybe even their own government so the risk premium is too high. They might still use Nvidia to build the models but not for running them and serving to customers
verdverm 2 hours ago|||
Up to the H200 iirc, but they haven't made a purchase yet afaik. The experts in such things believe if they do make a purchase, it will be a token one. Xi is pushing hard for indigenous production, not becoming "hooked" to American Ai chips like some (not so bright people) think we can cause to happen.
npn 9 hours ago||
The price is crazy.

And I guess Gemini 3.5 pro will have the pricing increment, too. 12 x 5 = 60?

It seems like google does want us to use Chinese models.

brianwawok 6 hours ago|
What exactly are you doing with this that you can’t generate $1.50 of value per million tokens?
bel8 6 hours ago|||
Generate 5x more value for the same amount of money.
s3p 4 hours ago|||
Wrong question.

Right question: What exactly is Google's plan for the long term pricing of these models, and are we all going to be priced out in a year?

wg0 8 hours ago||
3x price increase for a similar model almost. And they said AI would be cheaper and ubiquitous.
alexandre_m 8 hours ago||
Ubiquitous like the crack epidemic.
verdverm 8 hours ago||
or 3/4 the price (of 3.1 Pro) if we believe their benchmarks
AgentMasterRace 1 hour ago||
Gemini 3.1 probation is literally the worst AI when I cycle from opus to got 5.5 then finally Gemini. It's actually insane that it's a frontier model. I rage at it more than my wife.
ElenaDaibunny 56 minutes ago||
but latency in real GUI workflows with 50+ steps is still the elephant in the room for cloud-based agents
OsrsNeedsf2P 10 hours ago||
Beats 3.1 Pro for price per token, but artificial analysis is showing it's dumber per token and costs more overall
golfer 9 hours ago||
Arena.ai is saying "Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers."

https://x.com/arena/status/2056793180998361233

nicce 7 hours ago||
Not sure what to think about this. There is no even GPT 5.5
sauwan 10 hours ago|||
Yeah, bummer. I was very excited for this release, but this killed it.
droidjj 9 hours ago||
The pricing is an issue.
asar 10 hours ago||
$1.5/m input tokens $9/m output tokens

6x the price of 3.1 flash lite

Aunche 9 hours ago||
"Flash-Lite" is a different product from "Flash", which is more expensive. They couldn't be more confusing with their naming though, especially since they have 3.1 Pro and not 3.1 Flash non-lite.
WarmWash 10 hours ago|||
I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.

Cost per task is a more productive measure, but obviously a more difficult one to benchmark.

iwhalen 10 hours ago|||
I wonder why they didn't discuss price in the post?

Compare to the GPT-5.5 announcement: https://openai.com/index/introducing-gpt-5-5/

himata4113 10 hours ago|||
I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.
wolttam 10 hours ago|||
It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.
himata4113 10 hours ago||
gemini models solve a problem in 80% less tokens so that's something to think about.
johaugum 9 hours ago||
Source?
himata4113 8 hours ago||
https://help.kagi.com/kagi/ai/llm-benchmark.html
simonw 9 hours ago||||
Gemini caching is confusing though:

  $0.15 / million tokens
  $1.00 / 1,000,000 tokens per hour (storage price)
I much prefer the OpenAI/DeepSeek way of pricing caching where you don't have to think about storage price at all - you pay for cached tokens if you reuse the same prefix within a (loosely defined) time period.
simonw 8 hours ago||
As far as I can tell Gemini caching DOES work like OpenAI - see implicit caching here: https://ai.google.dev/gemini-api/docs/caching

I confirmed this by running a bunch of prompts through Gemini 3.5 Flash without doing anything special to configure caching and noting that it comes back with a "cachedContentTokenCount" on many of the responses.

The "storage price" quoted is for an optional Gemini feature that most people don't care about: https://ai.google.dev/gemini-api/docs/caching#explicit-cachi...

__jl__ 10 hours ago||||
In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.
svachalek 9 hours ago||
In my experience Google is the most flaky in general, which is surprising considering the rock solid history of their search and other products. Just more likely not to respond at all, to give a response out of left field, to handle the same error in 12 different ways randomly (a rainbow of HTTP status codes and error messages), etc etc.
gwern 6 hours ago|||
I agree. The https://aistudio.google.com/ is shockingly bad. I'm not sure I've ever used such a flaky Google service before. It's so much worse than Gmail or Google, not to mention ChatGPT or Claude or DeepSeek or Kimi or Midjourney web interfaces. The bizarre janky integration with your Google Drive, or Gemini or NBPs randomly erroring out, often indefinitely. I've had sessions refresh themselves and just... disappearing. Or when you get frustrated with a buggy dead session and hit 'new session' and have to wait minutes for 'saving...' to happen.
veselin 8 hours ago|||
Exactly our experience too. Effectively we catch these and on these status codes, we send to OpenAI. Retrying the same query in Gemini has high chance to give kind-of the same status code.
minimaxir 10 hours ago|||
10% of input pricing is standard especially compared to competition.
himata4113 10 hours ago||
yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.
John7878781 10 hours ago||
[deleted]
stri8ed 10 hours ago||
Output cost is 3x from Gemini 3 flash.
s3p 10 hours ago||
Yikes. I think the concept of a 'flash' model is changing, no? Google used to market this as its lower-intelligence, faster, cheaper option. I appreciate that they are delivering on both of those, but personally I would appreciate if they could create an incremental knowledge improvement while holding price steady. Fortune 500 companies have to make their money I guess.
kilpikaarna 17 minutes ago||
Real smart. I’ve come to associate ”Flash” with ”useless make-shit-up”, and always look for Thinking/Pro when I see it set. Now, suddenly, there is only Flash?
2001zhaozhao 9 hours ago|||
I think flash just means "fast" now
likium 8 hours ago|||
My guess is Gemini Pro coming later will be 2x more, bringing it comparable to Opus’s pricing.
toraway 8 hours ago|||
That would be Flash Lite now, and I'm also interested in the cheaper end of things so kinda disappointed they didn't release 3.5 Flash Lite at the same time...
brikym 7 hours ago||
How is this progress? The token cost just keeps going up and up. Flash is the new Pro? Do the models actually cost more to run or is it fattening margins?
nikhilpareek13 7 hours ago|
worth noting that Google marked this stable rather than preview, which is unusual compared to their recent releases. Pair that with the 3x price hike and flash pricing now reads like long-term floor they want, not a temporary thing they will walk back later. But its hard to tell yet whether that's Google specifically reading the room or the whole industry quietly resetting the cheap-inference baseline.
More comments...