Top
Best
New

Posted by spectraldrift 10 hours ago

Gemini 3.5 Flash(blog.google)
https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...
643 points | 474 comments
easygenes 1 hour ago|
For those who would like to know the total and active parameter count of this model: even though Google doesn't disclose the model technicals, we can infer them within relatively tight margins based on what we do know.

We know they serve the model on TPU 8i, which we have plenty of hard specs for (so we know the key constraints: total memory and bandwidth and compute flops). We can also set a ceiling on the compute complexity and memory demand of the model based on knowing they will be at least as efficient as what is disclosed in the Deepseek V4 Technical Report.

We can also assume that the model was explicitly built to run efficiently in a RadixAttention style batched serving scenario on a single TPU 8i (so no tensor parallelism, etc. to avoid unnecessary overheads... Google explicitly designed the 8th-generation inference architecture to eliminate the need for tensor sharding on mid-sized models).

We know Google intends to serve this model at a floor speed of around 280 tok/s too.

Putting all these pieces together, we can confidently say this model is ~250-300B total, and 10-16B active parameters. Likely mostly FP4 with FP8 where it matters most.

Visual:

  ┌────────────────────────────────────────────────────────┐
  │                   TPU 8i VRAM (288 GB)                 │
  ├───────────────────────────┬────────────────────────────┤
  │   Static Model Weights    │  Dynamic Allocations &     │
  │   (250B - 300B @ Mixed    │  Compressed KV Caches      │
  │   FP4/FP8)                │  (RadixAttention / SRAM)   │
  │   ~110 GB - 150 GB        │  ~138 GB - 178 GB          │
  └───────────────────────────┴────────────────────────────┘
I do model serving optimization work. This is napkin math.

Edit: There's one factor I under-rated in my initial estimate... TurboQuant. This is a compute to KV memory use tradeoff. It's plausible with TurboQuant at a quality-neutral setting they've gotten the model up to 400B with similar economics. This is a variable effecting concurrency and the the way they decided total model size was likely based on what they see for the average user's average KV cache depth in real-world usage.

gertlabs 45 minutes ago||
We've been really impressed with the performance of ~30B parameter class models and how close they are to the frontier from ~6-12 months ago, which begs the question, are the frontier labs really serving 10T parameter models? Seems unlikely.

If these Gemini 3.5 numbers are accurate, then I'd wager GPT 5.5 and Opus 4.7 are a lot smaller than people have speculated, too. It's not that frontier labs can't create a 5T+ parameter model, but they don't have the data to optimize a model of that size.

Gemini 3.5 Flash is really smart in one-shot coding reasoning, btw. Near the frontier. But it doesn't do so well in long horizon agentic tasks with arbitrary tool availability. This is a common theme with Google models, and the opposite of what we see with Chinese models (start dumb, iterate consistently toward a smart solution).

Data at https://gertlabs.com/rankings

easygenes 40 minutes ago||
We know from NVIDIA's public Vera Rubin inference engine marketing materials that the frontier lab models are ~1-2T total.

Mythos is an exception that's larger.

daemonologist 1 hour ago|||
If this is accurate it raises the question: why is this model so expensive? DeepSeek v4 Flash is 284B total/13B active, FP4/FP8 mixed, and only costs $0.14/$0.28 - even less from OpenRouter. Of course Gemini 3.5 Flash is most likely a better product, and therefore it can command a higher price from an economics perspective, but does this imply Google is taking roughly a 90% profit margin on inference? If so they're either very compute-limited or confident in the model and wanting to recoup training/fixed costs (or both).
xmonkee 56 minutes ago|||
Well, we use flash models extensively (both 2.5 and 3.1) and I cannot overstate this, google cannot fucking serve them without 503s 70% of the time on most days

I think it’s pure economics. Flash models are OP for the price, leads to too much demand, google cannot serve it. This is likely expensive to reduce load and hey, if it still makes money just keep the margin.

WarmWash 44 minutes ago|||
Rumor is that GCP was happily selling compute to competitors. After all, under the hood, Google is closer to a federation than a corporation. The state of GCP doesn't care about the state of Gemini.
happyopossum 22 minutes ago||
> Rumor is

It’s not a rumor - there are many public announcements about $B deals around compute for other Ai companies

zacksiri 1 hour ago|||
Do you have similar math for the flash-lite variant of the models? I'd be curious. Based on my testing / benchmark i think it's around the 100-120B mark.

With the Pro variant being around 600B - 800B

My testing is comparing it's performance / output to other models in the same size range, so not as scientific as yours.

anthonypasq96 1 hour ago|||
given this, is it safe to assume that inference pricing is barely related to cost to serve at this point and there is considerable margin?
Maven911 1 hour ago||
Tell me more about what your day looks like. What do you think of the LLMOps books from Abi, in case you have read it ? Any other resources you can recommed?
simonw 8 hours ago||
The pelican is a lot: https://github.com/simonw/llm-gemini/issues/133#issuecomment...

Not a great bicycle though, it forgot the bar between the pedals and the back wheel and weirdly tangled the other bars.

Expensive too - that pelican cost 13 cents: https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-fl...

hedgehog 8 hours ago||
That pelican looks like it's in Miami for a crypto conference.
seemaze 3 hours ago|||
That pelican wears it's sunglasses at night. So it can, so it can keep track of the visions in it's eyes.
baochillchill 1 hour ago|||
It looks quite funny.
whh 3 hours ago|||
Pelican and I need an optometrist urgently
joseda-hg 8 hours ago||||
It looks like the starting soon screen of a crypto presentation
xattt 8 hours ago||||
It looks like it’s been partying for 60 years based on the wrinkles on its pouch.
coffeecoders 4 hours ago||||
That pelican looks like it lost 100k on NFTs and now runs a paid stock-trading group.
Xenoamorphous 7 hours ago||||
Pelican in a white Testarossa.
airstrike 4 hours ago||||
They're called ClawCons now
sho_hn 3 hours ago||
Personally, I don't attend them since I figured out I can set up agents to performatively engage in AI-related discussion and events for me, freeing up tons of my time thanks to automation.

Truly: Nothing better than AI tools to brave the challenges and requirements of modern life. "Claude, ride the hype train" is the decisive prompt you need.

egillie 7 hours ago||||
and somehow in 1992
brindleth 5 hours ago||||
It look like the start of a new viral Peliwave aesthetic
verdverm 7 hours ago||||
sorta looks like the Tron ripoff in the I/O keynote
irthomasthomas 8 hours ago|||
This is a perfect illustration of something I noticed with llm progress. Ask them to improve an svg like this, and it never fixes the missing crossbar or disconnected limbs, it just adds more stuff. In this example they have obviously improved greatly, and it contains a ridiculous amount of detail, but they still to get the basic shape of the frame wrong. It's weird. And the pattern shows up everywhere, try it with a webpage and it will add more buttons and stuff. I've even experimented with feeding the broken pelican svgs to an image model to look for flaws, and they still fail to spot the broken elements.

edit: fixed human hallucination

derefr 7 hours ago|||
When you say "improve an svg like this", how are you imagining setting that workflow up? Are you just feeding them the SVG to iterate on; or are you giving them access to a browser to look at the rendering of the SVG?

I ask because:

Insofar as the original pelican test is zero-shot, it effectively serves as a way to test for the presence of a kind of "visual imagination" component within the layers of the model, that the model would internally "paint" an SVG [or PostScript, etc] encoding of an image onto, to then extract effective features from, analyze for fitness as a solution to a stated request, etc.

But if you're trying to do a multi-shot pelican, then just feeding back in the SVG produced in the previous attempt, really doesn't correspond to any interesting human capability. Humans can't take an SVG of a pelican and iteratively improve upon it just based on our imagined version of how that SVG renders, either! Rather, a human, given the pelican, would simply load the pelican SVG in a browser; look at the browser's rendering of the pelican; note the things wrong with that rendering; and then edit the SVG to hopefully fix those flaws (and repeat.)

I imagine current (mult-modal and/or computer-use) LLMs would actually be very good at such an "iterative rendered pelican" test.

irthomasthomas 7 hours ago||
I'm talking about two type of improvement, model improving, and prompt based improving. I am noticing that the baseline output has a lot more going on, the model has improved, yet it still makes those obvious looking mistakes with the shape of the frame or disconnected limbs etc.

And I am saying that if you take one of these SVGs and ask an LLM to look for flaws, it rarely spots those obvious flaws and instead suggests adding a sunset and fish in the birds mouth.

stared 5 hours ago||||
To a certain extent, it feels like a Sonnet 3.7 moment. Slightly overeager - you ask for a button color change, you see layout changes, new package dependencies, and the README rewritten from scratch - and not necessarily correctly.

When I ask for a pelican on a bike, I want the Platonic ideal of a pelican on a bike, not a vision of an alternative reality in which pelicans created bikes. Though, thinking about it again, maybe I should.

p1esk 3 hours ago||
What is “Sonnet 3.7 moment”?
stirfish 2 hours ago||
Slightly overeager - you ask for a button color change, you see layout changes, new package dependencies, and the README rewritten from scratch - and not necessarily correctly.
Araopa 2 hours ago||||
So we have to train llms on debugging too, not just how to make things (which you easily train by feeding the outputs).
sosborn 3 hours ago||||
This matches my experience with human too FWIW.
emp17344 2 hours ago||
Why is there always an identical reply like this when anyone criticizes LLMs?
gowld 3 hours ago||||
It's because LLMs are fundamentally generative (creative), not truth-seeking or logic-seeking. Simple logic has always been incredibly expensive to impossible for LLMs.
girvo 5 hours ago|||
Their ability is best described as "spiky". To steal from aphyr: think kiki, more than bouba. Whats interesting is that a lot of the models seem to have similar spikes and "troughs", though there are differences.
tantalor 8 hours ago|||
Forgetting the chainstay is typical of asking random people to draw a bicycle.

https://www.gianlucagimini.it/portfolio-item/velocipedia/

> most ended up drawing something that was pretty far off from a regular men’s bicycle

et1337 7 hours ago|||
Asking random people to write SVG gives even worse results
lxgr 6 hours ago||
Especially without being able to look at the rendered output! (At least I'd be surprised if modern server-side tool calls regularly include an SVG renderer that can show a rasterized version to the model to iterate on it.)
gpm 2 hours ago||
One of the many things Google was pitching today is that they're going to run things like google search with access to linux container environments to do things like run tool calls... which will presumably be able to rasterize SVGs and show them to the model.

But Simon says he runs these through the API without tool access specifically to prevent that sort of "cheating". I.e. it's an LLM benchmark not an LLM+Harness benchmark.

Eji1700 5 hours ago|||
Although every single render of those has pedals on the correct side as opposed to the Gemini optical illusion back pedal that tries to be both on the other side of the central gear and infront of the back wheel.

Not really a criticism but an interesting point that you would never expect a human to make that mistake even in a bad drawing.

smcleod 8 hours ago|||
I feel like it embodies Google's vibe of an uncool guy trying to stay relevant to the youth pretty well.
dzhiurgis 2 hours ago||
That's grok. IMO both gemini and grok are the most overlooked models.
tandr 3 hours ago|||
If you sort that table by "output token price", it gets really terrifying - going from 4 cents up to $600 =8-O
nrds 3 hours ago|||
We've been daily-driving this model for a few weeks and let me tell you, everything it does is a lot. Fast as fuck and it's actually not bad intelligence-wise for a fast model. It basically tries to make up for any intelligence deficit by just doing a lot, checking a lot, retrying a lot.

That's not to say I don't spend my days raging at it... a lot... but it's not that bad. It does tend to ignore completion criteria but it doesn't obviously degrade when being nudged like some models do.

dekhn 3 hours ago|||
I'm told there is a new Jeff Dean fact inside google: "Jeff Dean manually adjusts the weights in the model just to screw with Simon".
bee_rider 1 hour ago|||
I wonder if they added all these unrequested details as an Easter-egg or something? (Since they must be aware of your test by now).
karmakaze 3 hours ago|||
I'm hoping we'll have many of these pelican cyclist pictures collected. Then when all the models can do it well, we'll stop posting about them, and dhen the next generations of AIs train on the data we'll have these canonical archetypes.
taurath 3 hours ago|||
I can’t help but think that what AI is best at is convincing management that things it creates are full featured which reads to their brains as mature
hydra-f 8 hours ago|||
Same old issue with Gemini models trying to "enrich" everything
nickvec 6 hours ago|||
I enjoy the vaporwave aesthetic it went for. Looks like the pelican has a fish in its mouth too?

https://en.wikipedia.org/wiki/Vaporwave

khy 7 hours ago|||
That sun is very similar to the one from the background of this other top HN post about the OS museum: https://news.ycombinator.com/item?id=48195009
sbinnee 5 hours ago|||
Wow what’s with all the styling? Is it manifestation of google’s styling bias? I like the result for sure. It’s shiny and pretty. But then it’s something I didn’t ask for.
danilocesar 3 hours ago|||
Given your pelican is very famous now, don't you think they are adding instructions to beat this benchmark those days?
Culonavirus 3 hours ago||
Well clearly it's not working lmao
Razengan 2 hours ago|||
I've found prompts like "capybara with spotted fur and 7 octopus tentacles instead of legs, each a different color, riding a tricycle" etc. to be a better test

Last time I tried, ChatGPT's image generator got the best result.

__mharrison__ 6 hours ago|||
They are just trolling you now
gcgbarbosa 8 hours ago|||
funny that when I try the same prompt, gemini generates an image, not an SVG. something is not right.
simonw 8 hours ago||
That's likely because you're using the Gemini app which has a tool for image generation (nano banana) - I do my tests against the API to avoid any possibility of tool use.
nickmccann 7 hours ago||
This question makes me wonder if you one shot each pelican or do you run it a few times to get the best one?
simonw 5 hours ago||
I one-shot. I have a long-standing ambition to have each model generate 3x and then get the model (assuming it's a vision model) to pick the best one.
setgree 7 hours ago|||
`<!-- Pelican Eye / Sunglasses (Cool Retro Aviators) -->`

wtf

`<!-- Gold Rim -->`

WTF??

nashashmi 8 hours ago|||
Beats a human by like 10$
unglaublich 8 hours ago||
So according to Google logic, the value of the pelican is $10-eps. (They applied that reasoning to conversions via adwords)
TacticalCoder 5 hours ago|||
Love your pelicans, as always. And that one is... Wow.

I noticed the "Synthwave" aesthetic, which is enjoying quite some success since quite some time now, has found its way into AI models (even when it's not in the user's query). It's not the first time I see the sun at sunset with color bands etc. in AI-generated pictures. Don't know why it's now taking on in AI too.

https://en.wikipedia.org/wiki/Synthwave

Hence the comments here about the 90s, Sonny Crockett's white Ferrari Testarossa in Miami, etc.

To be honest as a kid from the 80s and a teenager from the 90s who grew up with that aesthetic in posters, on VHS tape covers, magazine covers, etc. I do love that style and I love that it made a comeback and that that comeback somehow stayed.

kridsdale3 4 hours ago|||
Sythwave vibe hype hit a cultural high point with the release of Far Cry 3 Blood Dragon in 2013.

So it's as relevant and baked-in to today as actual 80s synth-culture was in 2000.

gowld 3 hours ago|||
At the keynote today, Sundar Pichai asked Gemini to clone the Dino Game, and it added a synthwave-esque aesthetic.
holtkam2 8 hours ago|||
at a certain point you're gonna need to change your benchmark because this will end up in the model's training set
simonw 8 hours ago|||
Gemini were the team most likely to have this in their training set - see https://x.com/JeffDean/status/2024525132266688757 - and yet their latest model still messes up the bicycle frame!
recursive 6 hours ago|||
I'm sure that certain point came and went many releases ago.
GodelNumbering 9 hours ago||
Per million input/output tokens:

Gemini 2.5 flash: $0.30/$2.50

Gemini 3.0 flash preview: $0.50/$3.00

Gemini 3.5 flash: $1.50/$9.00

Interesting pricing direction. I don't think we have ever seen a 3x price increase for in the immediate next same-sized model (and lol @ 3 only ever getting a preview).

3.5 flash costs similar to Gemini 2.5 pro which was $1.25/$10

__jl__ 7 hours ago||
This understates the cost increase. 3.5 Flash also uses more tokens. artificialanalysis.ai shows these difference to run the whole eval, which I think is more realistic pricing:

Gemini 2.5 flash (27 score): $172 (1.0x)

Gemini 2.5 pro (35 score): $649 (3.8x)

Gemini 3.0 Flash (46 score): $278 (1.6x)

Gemini 3.5 Flash (55 score): $1,552 (9.0x or 2.4x compared to 2.5 pro)

This is a massive price increase... 5.6x compared to Gemini 3.0 Flash

doginasuit 8 hours ago|||
They probably never intended to keep serving cheap models. This is a natural way to introduce the squeeze, now that they have people who built services on their API. It makes a lot of sense to have an abstraction layer where the provider doesn't matter. If you are working in Kotlin, Koog is excellent.
lanthissa 7 hours ago|||
switching models is insanely cheap compared to token cost on anything signficant, this is a take so cynical it misses the reality
Clueed 6 hours ago||
in any corporate or half compliance-relevant setting switching isn't trivial. new DPA, subprocessor notifications, TIA, procurement review, security questionnaires, plus re-running your evals because prompts don't transfer 1:1. token cost is just one of the line items.
lanthissa 5 hours ago|||
no it really not, even the soggiest bank has multiple api vendors atm.
alexandre_m 3 hours ago||
I agree with parent. I'm not sure where your stance is coming from.

From what I hear, most enterprise AI deployments are seat-based subscriptions with annual commitments.

opsnooperfax 2 hours ago|||
50K FTE global firm. We’re still piloting ChatGPT. AI is a four-letter word and there are ridiculous ceremonies and hundreds of hours of overhead for every trivial use case.

Amusingly, Enterprise credits are more expensive than just paying a zero-commitment on-demand API fee. Personal accounts are still the best value.

p1esk 3 hours ago|||
Yes, I work at a 50 person startup and even here switching from CC to codex or cursor would be non-trivial for multiple reasons - not just the annual commitment.
opsnooperfax 2 hours ago||||
I think the big 3 are cartelizing and starting to ratchet up costs. GPT5.5 is not easily distinguishable from 5.1. I would it be shocked if we hit the ceiling and everyone is quietly positioning for the exit.
hnarn 8 hours ago|||
> now that they have people who built services on their API

People really can’t wait to be the next Zynga

rudedogg 9 hours ago|||
If Google is actually getting cheaper inference than everyone else with their TPUs, this smells like trouble to me. Maybe serving LLMs at a profit is proving difficult.

Or maybe they think because their benchmarks are good they can ramp up the prices. Seems like they don’t have the market share to justify a move like that yet to me.

tempaccount420 8 hours ago|||
This is not priced at inference cost.

My guess: it's the price at which they make more money than if they rent the TPUs to other companies.

The Gemini team has had trouble securing enough TPUs for their user's needs. They struggle with load and their rate limits are really bad. Maybe at a higher price, they have a better chance at getting more TPUs assigned?

gpm 7 hours ago||
The cost at such they could rent out the TPUs, i.e. the market rate, is the inference cost.

Just because you are vertically integrated doesn't mean you get to discount the one business units products to the other. Doing so discounts the opportunity cost you pay and is just bad accounting.

KoolKat23 4 hours ago|||
Basic business principle, you charge what people are willing to pay not what it costs.
dash2 6 hours ago||||
Look up “double marginalisation”.
HDThoreaun 6 hours ago|||
Depends on if you have spare capacity I think. They have minimal competition so they might be maximizing profit by charging prices higher than what clears all their supply.
spyckie2 7 hours ago||||
Its probably that in 1 or 2 years local (free) models will completely take the place of cheap models so cheap models need to move up the quality chain.

You have free local models for most tasks, $20 subscriptions for near-frontier intelligence, and API per token costs for frontier intelligence.

Flash seems to be targeting the near-frontier category.

TurdF3rguson 7 hours ago||
That might work if it wasn't for FOMO. Are you ok with only $20 of frontier usage a month?
rohansood15 3 hours ago||
Subjective, but if we compare to compute not everyone needs the most expensive laptops or super computers for their work.

I think frontier models will be invaluable for scientific research, defense, financial analysis and such. But the average person probably would be reasonably well-served with a local model.

If you're in sales, customer service, product management and such - the leading open models at the 30B mark are already good enough.

booty 7 hours ago||||
Prevailing wisdom is that serving LLMs at a profit is achievable... it's when you factor in the cost of training them that prices get astronomical real fast.

Open-source model inference providers (who do not have to bear the cost of training) seem able to do it at much lower prices.

https://www.together.ai/pricing

https://fireworks.ai/pricing#serverless-pricing (scroll down to headline models)

Of course, it's possible that they are burning through investor cash as well, and apples-to-apples comparisons are not possible because AFAIK Google does not mention the size/paramcount for 3.5 Flash.

But if the prevailing wisdom is true, I think it's actually encouraging. It suggests that OpenAI and Anthropic could perhaps, if they need to, achieve profitability if they slow down model development and focus on tooling etc. instead. If true that's probably good news for everybody w.r.t. preventing a bursting of this economic bubble.

...my opinions here are of course, conjecture built on top of conjecture....

eklitzke 3 hours ago|||
Most of the training cost is not in the final training run, it's in all of the R&D (including salaries, equity, etc.) that it takes to get to the final training run. The actual cost of all of the TPUs (or GPUs), power, networking, storage, etc. for the final training run is significant, but it's even more expensive to have this huge R&D team doing frontier model development and using a lot of those same resources during development.

I think you're right that releasing models at a slower cadence would bring down costs to some degree, but it's not clear how much. All of these companies could significantly reduce their opex but at the risk of falling behind in terms of being at the frontier.

HDBaseT 5 hours ago|||
Not to discredit you, because you are 100% correct but tangential note about together.ai, they seem fairly unreliable with constant outages or higher than normal latency.
BoorishBears 6 hours ago||||
This is trouble if you're not Google/OpenAI/Anthropic: they're all shifting towards pricing for the economic value of the knowledge work they're aiding.

The economic value increases non-linearly as models get more intelligent: being 10% more capable unlocks way more than 10% in downstream value.

That's trouble because the non-linear component means at some point their margins will stop primarily defined by the cost of compute, and start being dominated by how intelligent the model is.

At that point you can expect compute prices to skyrocket and free capacity to plummet, so even if you have a model that's "good enough", you can't afford to deploy it at scale.

(and in terms of timing, I think they're all well under the curve for pricing by economic value. Everyone is talking about Uber spending millions on tokens, but how much payroll did they pay while devs scrolled their phones and waited for CC to do their job?)

IncreasePosts 8 hours ago|||
Maybe the margins are just very large for Google because they predict so much demand for 3.5?
GodelNumbering 8 hours ago||
This combined with locally runnable models getting pretty good recently (e.g. Qwen 3.6) tells me that it's time to seriously consider local dev setup again
MASNeo 8 hours ago|||
Besides the cost you get the control, transparency and ability to identify small language models or LoRA you want to serve even more cost effective.
cft 8 hours ago|||
This should become the new Apple's hardware and software play. I am hopeful about the new CEO
hei-lima 8 hours ago|||
We need another "Deepseek moment" or else it will become impossible for the regular dude to use AI. It will become something that only big companies can afford.
SwellJoe 7 hours ago|||
We're having DeepSeek moments every couple of weeks.

Qwen 3.6 hit hard in the self-hosting space. It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.

The Prism Bonsai ternary model crams a tremendous amount of capability into 1.75GB.

And, DeepSeek V4 is crazy good for the price. They're charging flash model prices for their top-tier Pro model, which is competitive with the frontier of a few months ago.

The winners in the AI war will be the companies that figure out how to run them efficiently, not the ones that eke out a couple percent better performance on a benchmark while spending ten times as much on inference (though the capability has to be there, I think we're seeing that capability alone isn't a strong moat...there's enough competent competition to insure there's always at least a few options even at the very frontier of capability).

Zambyte 7 hours ago|||
> It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.

You can lower that to at least 24GB. I've been running Qwen 3.5 and 3.6 with codex on a 7900 XTX and the long horizon tasks it can handle successfully has been blowing my mind. I would seriously choose running my current local setup over (the SOTA models + ecosystem) of a year ago just based on how productive I can be.

hei-lima 3 hours ago||
Gonna try it.
trollbridge 6 hours ago|||
We have Qwen 3.6-35b (6) on a 5090 (32GB) and it's blowing me away. Works fine for most (not all) code generation tasks. One developer here has been extremely stubborn about adopting AI; he's finally adopted it, albeit only when it's coming from a local model like this.

DeepSeek V4 Pro likewise is insanely good for the price. I simply point it at large codebases, go get a cup of coffee or browse Hacker News, and then it's done useful work. This was simply not possible with other models without hitting budget problems.

akulbe 5 hours ago||
Any chance you'd be willing to talk further about your setup? I have 2 x 3090s in a local machine, and I'm still left with questions about how best to use stuff locally.
sheeshkebab 3 hours ago||
You can only run heavily quantized models on all 3/4/5 rtx gpus (with 32gb or less vram) - and you probably want moe versions like Qwen 35b for this to run at speed somewhat comparable to Claude. It’s still not there to be honest but getting there. Personally I mess around with llama.cpp on m5 max with 128gb - it’s a decent setup to try various medium sized things, and runs llms surprisingly well without quantization, at least the moe models.
SwellJoe 3 hours ago||
Two 3090s is 48GB, so it's possible to run the 6-bit quantization comfortably, which is fine. It doesn't start to get notably dumber until lower than that. It won't be as fast as a hosted model, but dual 3090s will be comfortably fast for interactive use with the MoE version and not terrible to use with the dense model. I run the dense model at 8 bits on my dual Radeon V620 desktop machine, which I think would be slower than two 3090s, or at least not notably faster.
hedgehog 3 hours ago||
Have you done comparisons with 4 bit and seen a noticeable difference for coding tasks?
SwellJoe 1 hour ago||
No, I've just seen benchmarks showing most models start degrading around 4-5 bits. That's not to say they become useless, just that down to about 6-bits (with careful hybrid quantizations like unsloth where some of the layers aren't quantized or are quantized at higher bit depths) the quality isn't measurably degraded, but below that there are measurable differences in performance.

People report good results from DeepSeek V4 Flash at 2 bits (the DwarfStar 4 folks are doing it, and I've tried it on my Strix Halo, but it's too slow to be usable, so I haven't bothered to figure out if it's actually smart enough to use for anything).

Anyway, it's obvious models have to degrade in terms of knowledge, at any quantization, even though it may not show up clearly on benchmarks until lower. If you halve the size of the data available, it necessarily loses information about the world.

squidbeak 8 hours ago||||
Deepseek had another moment a few weeks ago. V4 isn't far behind the US frontier, and so far its flash variant seems a very reliable coder and costs a pittance.
ai_fry_ur_brain 8 hours ago||
Deepseek V4 (not flash) trippled in price too by the way (from Deepseek). Get used to this pattern.

This is what you get for relying on the generosity of billionaires. Keep offshoring your thinking ability to a machine and let me know how competitive you. Hint, you wont be. There's nothing special about being able to use an LLM.

npn 8 hours ago|||
Unlike other providers, Deepseek does promise that they will lower the price when their Huawei cards arrive in a few more months.
flakiness 6 hours ago||
Give me a link. Cannot wait. One PSA is that they have 75% discount right now so it is already cheaper than the full price.
npn 6 hours ago||
Weird, last time I checked it was right on the pricing page.

But even when it happens I doubt it would be as cheap as it is right now. Enjoy it while it lasts!

ls612 7 hours ago||||
Anyone can host Deepseek V4 on rented GPUs and sell inference on it. Price will very quickly converge to the marginal cost of inference. This is as close to a pure commodity as it gets in the AI space so competitive market economics will put in work. Same is true for any open-weights model.
ai_fry_ur_brain 7 hours ago||
You dont understand the costs involved to run inference at scale

Please go run some numbers.The hardware needed to Run Deepseek v4 flash at 20 tps for a single session is nowhere close to what is required to run it at 50tps for 5,000 concurrent sessions.

Imagine what it takes to be profitible when running at 150 tps for 30cents per 1mm. You make less than 1k per month and the hardware required to run that cost 10k a month to rent with hardly any concurrent session capability.

gpugreg 5 hours ago|||
> Please go run some numbers.

- DeepSeek serves DeepSeek V4 Pro at 27 tps: https://openrouter.ai/deepseek/deepseek-v4-pro

- At 27 tps per user, a B300 GPUS will give you around 800 tokens per second (serving 30 users): https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...

- That's 800 * 60 * 60 generated tokens per hour, at a cost of $0.87 per 1M tokens, or $2.50 per hour.

- For input and output tokens, the math is a bit more complicated because we have to make assumptions about their ratio. Using the published values from OpenCode, we get another $2.50 for cached tokens (which are almost free for DeepSeek) and another $3.40 for input tokens (which are a lot cheaper to compute than output tokens), which gives us a total of $8.50 per hour per B300 GPU.

- B300 GPUs can be rented for as low as $3.40 per hour, which is less than $8.50, so hosting DeepSeek V4 Pro is profitable.

You could also host it at fewer tps per user to raise the efficiency and therefore the profit even higher.

ls612 5 hours ago||
Even not assuming Blackwell inference the $3.50/hr price is likely close to the marginal cost. The Deepseek R0 model is a little more than a third of the size of V4 and cost around $1/Mtok to serve at scale based on deepseek's blogs last year and Hopper rental prices.
ls612 7 hours ago|||
Yes it is more efficient in $/tok to run at scale than to run just for yourself. Everyone selling Deepseek V4 inference is selling an undifferentiated good. They have run the numbers on how much it costs and are competing against a dozen other outfits also selling undifferentiated open weights tokens. Whatever the dollar cost they face to rent those GPUs will be what they are able to charge in the competitive market. That is great for you and me because we can buy tokens at pretty much exactly what it costs to produce them.
dpoloncsak 8 hours ago||||
Mate why are you so mad at people upset the price trippeled? It's a fair complaint that people built services using the cheaper ones with the expectation future models would be similarly priced. You can avoid 'offloading thinking' while still building ontop of these models
zaptrem 6 hours ago||||
V4-Pro is about 2.4× total params and 1.3× active params of V3.2.
creationcomplex 5 hours ago||||
You're typing as your handwriting and letter sending abilities deteriorate to dust. Writing down information as your memory capacity decays. Remembering instead of living at the pure leading edge of perception dulling your reactions.

Smh, it's all downhill from the first unadulterated neuron.

aurareturn 8 hours ago|||
I think demand is too great and compute is not enough. Nothing to do with billionaires colluding to increase prices by 3x.
boutell 5 hours ago||
Actually, why should Google collude on pricing? They have deep pockets and could starve out the competition while keeping prices low, if they really wanted.

I think it is priced high because it's basically their smartest model as well as their fastest, so why shouldn't they?

You can still use earlier generations of Flash at a lower cost if you want "fast and cheap and just OK," which often makes sense. (Just checked)

I would predict they will lower this price when 3.5 High appears, but perhaps not all the way.

xbmcuser 7 hours ago||||
What we need is a deepseek moment in hardware ie China reaching parity on node size that is the only way latest computers let alone latest ai will be available to us in the future otherwise the profit margins will push most production to AI.
blackoil 23 minutes ago|||
Open Source ASML EUV. But will wipe off trillions from US stocks so 401k may not like that.
throwa356262 7 hours ago|||
To be honest, China not having access to the latest hardware is exactly what has driven LLM technology forward the last 2 years.
humanfromearth9 7 hours ago||
Why?
Weryj 7 hours ago|||
Because it forced them to focus on efficiency, instead of throwing more compute at the problem.

Just like in software, some of the most beautiful solutions come from constraints. Think, the optimisations that game developers implemented because of the frame budget.

Viacol 2 hours ago|||
On top of that, China is also facing hardware constraints, which is pushing companies to develop better domestic chips for AI training. It'll be interesting to see how things perform once Huawei's newer hardware is fully deployed at DeepSeek.
stared 4 hours ago||||
We have a "DeepSeek moment", https://github.com/antirez/ds4 (see https://news.ycombinator.com/item?id=48142108).

Or if you prefer smaller ones, Qwen3.6-35B-A3B, https://huggingface.co/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF

segmondy 8 hours ago||||
You can use lots of open weight models today.
hei-lima 8 hours ago|||
That's one solution to the problem. But it still needs some good computational capabilities. Either we optimize the hell out of those models, or we wait for the hardware to become good enough for them.
Gigachad 5 hours ago|||
The real problem is the hardware to run them is still very expensive.
pianopatrick 6 hours ago||||
Maybe we can figure out better ways to use the models that can run on cheap hardware.
GeorgeOldfield 8 hours ago|||
gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh
k8sToGo 7 hours ago|||
Are you really comparing flash to opus? Shouldn't you be comparing pro?
CognitiveLens 7 hours ago||
The benchmark tables in the Google announcement include Opus 4.7, and the numbers are very impressive. Caveat emptor, but it's not unreasonable to compare a new Flash to a current-gen Opus, even if some of the results confirm expectations
bachmeier 7 hours ago||||
Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.
kmac_ 7 hours ago|||
Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.
fnordsensei 9 hours ago|||
3.5 flash is listed as stable rather than preview, or am I misreading?

https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...

GodelNumbering 8 hours ago||
ah I mistakenly wrote preview
dr_dshiv 8 hours ago|||
3.1 flash lite — $0.25/$1.50 — plus insanely fast.

3.1 flash lite isn’t quite as good as 3 flash preview (which is the most incredible cheap model… I really love it) — but 3.1 is half the price and the insane speed opens up different use cases.

For comparison, Opus models are $5/$25

SwellJoe 8 hours ago||
Opus 4.7 is smarter than even Gemini 3.1 Pro on nearly every metric, though. You're comparing apples to oranges. Gemini 3.1 Flash is somewhere in the neighborhood between current Haiku and Sonnet, I think? Still a better value than the Anthropic models, I guess, which are quite pricey.

Since Gemini 3.5 Flash is raising the price to $1.50/$9.00, it's priced between Haiku and Sonnet. If it outperforms Sonnet, it remains a good value, I guess. Though DeepSeek V4 Flash is much cheaper than all of them, and seemingly competitive.

WarmWash 6 hours ago||
>Opus 4.7 is smarter than even Gemini 3.1 Pro on nearly every metric,

Outside of coding, claude models are pretty meh. GPT and Gemini are the workhorses of science/math/finance.

robwwilliams 5 hours ago|||
Not in my fields of science: Genetics and neuroscience. The combination of Opus 4.7 Adaptive used with well structure project folders is amazingly useful.
epolanski 4 hours ago|||
And even on coding, they are mostly good at generating new code.

They sure are not at thorough analysis or debugging, etc.

OakNinja 7 hours ago|||
To be fair, Gemini 3.1 flash _lite_ supports structured output (guaranteed json), it’s super fast, runs circles around 2.5 flash and costs $0.25/$1.50.

I use it _a lot_ and it’s very capable if you just plan correctly. I actually almost exclusively use 3.1 flash lite and 2.5 flash lite (even cheaper) and we have 99.5% accuracy in what we do.

That said, I think we’ll see the lite/flash models and the pro models will diverge more price wise. The pro models will become more and more expensive.

WhitneyLand 8 hours ago|||
Their rationale might be that it’s size and intelligence are growing relative to the market.

Fwiw it’s beating Claude Sonnet in most benchmarking (benchmaxxing?), yet they’ve priced it almost half off on a per token basis.

Question is are you going to persuade anyone with this argument?

Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.

SyneRyder 7 hours ago||
> Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.

A few weeks ago, Steve Yegge claimed he'd heard that Google employees are banned from using Claude & Codex.

https://x.com/Steve_Yegge/status/2046260541912707471

A number of Googlers replied to say that was totally false, including Demis Hassabis, but they were all on the DeepMind team.

https://x.com/demishassabis/status/2043867486320222333

This person here claims they left Google because of the ban, and because the ban applied outside of Google work as well:

https://x.com/mihaimaruseac/status/2046272726881693960

myko 1 hour ago||
> and because the ban applied outside of Google work as well

I think false (or hasn't filtered to everyone lol)

dbbk 9 hours ago|||
I don't think they're really comparable. Seems they created the Flash-Lite tier to take the spot of the old Flash models.
GodelNumbering 8 hours ago||
No, 2.5 had both flash and flash lite.
mlmonkey 7 hours ago||
It is Google, after all ....
photonair 8 hours ago|||
In general, Gemini flash is still relatively cheaper compared to the "mini" version of the other big 2. However, I agree that newer version seem to have multiple X price increase (similar to the new ChatGPT) and we certainly need competition from the open source models to keep these guys in check with pricing.
malloryerik 1 hour ago|||
To me this is almost like a tone-deaf naming change.

Empty Slot (new Pro as Mythos competitor?)

Old Pro -> now Flash

Old Flash -> now Flash Lite

Old Flash Lite -> now Gemma (and not served by Google)

I say "almost" because the situation is more fluid and unstable than a normal naming change. If Apple were to do this with laptops, maybe it'd be like, Air gets better and pricier and becomes Pro-level model, Neo same way becomes Air-level model, etc. But Apple's too design oriented to do something like that. Google, well...

This change has made me decide to move to a multi-provider situation like through OpenRouter for consumer-facing LLM api in a service I'm building. I just can't trust Google to not constantly rearrange everything under our feet. Doesn't mean I won't use Gemini, but it clearly means I need to have others in the mix ready to go. In fact I used to use lots of Flash Lite, which is now Gemma territory, and I can't get that served by Google anymore and don't want to run my own hardware.

But in any case, I'd compare this "Flash" model with previous "Pro" on all metrics. It's kinda like if in clothes a Small suddenly became what was a Large, or at Starbucks a Grande became the new de facto Venti. And only for the new! drinks.

And if we think this way, it's possible that prices are actually falling?

LetsGetTechnicl 8 hours ago|||
Gen AI is unprofitable, especially at the insanely cheap rates they've been offering to get people in the door. So expect more increases in the future.
roadside_picnic 8 hours ago|||
These companies are unprofitable (as all companies at this stage and ambition should be) but I increasingly don't see any justification for the idea that it is fundamentally unprofitable.

Inference alone is certainly profitable. I'm running models at home that are comparable to performance of paid models a year or so ago for free. Even for much larger models the cost around inference serving are clearly manageable.

Training is where the costs are, but I'm increasingly convinced those too could have costs dramatically reduced if necessary. Chinese companies like Moonshot.ai are doing fantastic work training frontier models for a fraction of the cost we're seeing from Anthropic/OpenAI.

This isn't like Uber or Doordash where the economics fundamentally don't make sense (referring to the early days of these services where rates were very cheap).

It's a compelling story that "current AI is unsustainable", but it doesn't pan out in practice for a multitude of reasons (not the least of which is that we can always fall back to what models did last year for basically free).

ReliantGuyZ 7 hours ago|||
And if you can run those strong models at home for free, why would hosting them be a successful business for any of these providers?

Profitable maybe, in terms of having low costs, but why pay Google or whoever when you can do it yourself for cheaper/"free"?

HDThoreaun 6 hours ago||
If you can run your server at home for free why would hosting it be a successful business for any of these propviders?
overrun11 5 hours ago||||
Arguably nothing even has to change with training for this to be sustainable. Dario has claimed that Anthropic is profitable on a per training run basis. They aren't profitable because they choose to keep investing in increasingly large training runs.
dsdsfaa 3 hours ago||
Cut the crap.

The value of the firm's operating assets = EBIT(1-t) - Reinvestment

You (Anthropic) want that sky-high valuation? Accept reinvestment is part of the equation.

If they decide to stop reinvesting, then they are as good as dead.

Moreover, they clearly are not re-investing cash flows from operations. Why do you think they are continually raising money? Lmao.

LetsGetTechnicl 7 hours ago||||
If it's profitable, why haven't they reported any profits? People like Ed Zitron have done the math and it just doesn't add up. I mean he just published this piece today: https://www.wheresyoured.at/ai-is-too-expensive/
anthonypasq 7 hours ago|||
Amazon was unprofitable for over a decade, and they were public. Theres no incentive to be profitable as a private company if you can continue to raise money.

Ed Zitron and Gary Marcus are... confused.

mynameisash 5 hours ago|||
> Amazon was unprofitable for over a decade, and they were public.

Amazon was unprofitable because they poured their revenue into growth. On paper, they were in the red, but everyone - especially investors - saw what was going to happen, given their trajectory.

Is it the case that any of these AI companies are actually making a ton of money and growing accordingly? AFAICT, we've just got [a] big players like Google that can subsidize AI in the hopes of waiting everyone else out and [b] private companies raising capital in the hopes that when the market returns to rationality, they may be solvent.

overrun11 5 hours ago||
Yes that is exactly what is happening. OpenAI and Anthropic are the fastest growing companies by revenue ever and their gross profit margins are healthy.
mynameisash 5 hours ago|||
According to this article[0]:

> HSBC Global Investment Research projects that OpenAI still won’t be profitable by 2030, even though its consumer base will grow by that point to comprise some 44% of the world’s adult population (up from 10% in 2025). Beyond that, it will need at least another $207 billion of compute to keep up with its growth plans.

This article is from six months ago. Was HSBC wrong; did something dramatically change in the last six months; is OpenAI not, in fact, profitable?, or are they in fact doing well but doing a huge investment (as was the case with Amazon 25ish years ago)?

I genuinely do not know, but my impression is that they're burning investment capital trying to compete with others' investment capital and Google's bottomless pockets.

[0] https://fortune.com/2025/11/26/is-openai-profitable-forecast...

LetsGetTechnicl 1 hour ago|||
Also OpenAI somehow having 44% of the world’s population as its customer base is a plainly absurd goal and will never happen, not in 5 years
dsdsfaa 3 hours ago|||
and to make matters worse, they are massively over-valued.

Whoever buys the stock at a richly priced 1tn at ipo is a bozo lmao. I know I know, index funds will be forced to hold it bypassing the 1 year rule. Disaster already.

LetsGetTechnicl 1 hour ago|||
Then why do they constantly need more and more funding from VC and Google and MS and NVIDIA? Why is it all circular dealing? Why aren’t there smaller AI startups running these smaller, “profitable” models?
timmytokyo 6 hours ago|||
But I've been told here -- over and over again -- that the cost of inference was going to go down as the technology matured.

The trend lines are going in the opposite direction.

goosejuice 7 hours ago|||
His entire brand is that the AI bubble will burst. By his account it was supposed to have several times by now. Like the doomers, it's not if it's when and they have to keep pushing back their predictions. Funny how both camps can be so confident. Alas, that's how they get eyes, ears and dollars.

That's not to say they will be or are wrong, it's just that they aren't exactly unbiased, or humble, sources.

booty 7 hours ago|||
Yeah, at this point I think the worst-case scenario for OpenAI/Anthropic/etc is to slow down frontier model development and focus on tooling and services, as opposed to imploding completely and bursting the economic bubble. I hope?
GaggiX 8 hours ago||||
If you don't need SOTA or near SOTA there are plenty of dirt cheap models, just look at Gemma 4 31B on Openrouter.
Gigachad 5 hours ago|||
For all of the use cases being hyped you really do, and you actually need something much better than the SOTA models to do what we are being told can be done.

The small models are useful for small things like summarizing text or search but not much else.

LetsGetTechnicl 1 hour ago||
Yeah a lot of AI hype is look at the amazing new thing our new model can do! Like Google at this event. But when pressed about its pricing reality the answer is “use a worse cheaper model”?? Real convincing argument there
ai_fry_ur_brain 8 hours ago|||
[flagged]
npn 8 hours ago||||
It is insanely profitable though, if you cut out r&d cost, plus the marketing and loss leaders. Don't let them gaslight you.

Even anthropic who does not own any hardware still have a big margin providing claude models.

LetsGetTechnicl 7 hours ago|||
Then why haven't they reported any profits using GAAP (generally accepted accounting principles)? They all use ARR which is easily gamed.
overrun11 5 hours ago|||
They aren't profitable on a GAAP basis and no one claims this. This obsession over profits is misguided. These are hyper growth companies growing at a scale never seen before. It is both deliberate and uncontroversial to invest in growth rather than slowing down to produce profits.
chillfox 1 hour ago||
If my retirement money is going to end up invested in these companies, either directly when they IPO or indirectly through compute providers, then I would like to see some proof that they are capable of producing profits. "Trust me bro" just ain't gonna cut it.
npn 6 hours ago|||
I don't really sure, but might be they count hardware purchase as loss, too.

Google has just recently upgraded their TPUs.

timmytokyo 6 hours ago|||
Everything is insanely profitable if you ignore the costs.
npn 1 hour ago|||
The premise is if they stop training new models then it will become pure profit after 2 years when the hardware finished paying for itself.

It's pretty funny that everyone say that this business is unsustainable, but I have yet seen anyone bankrupt, even the pure hardware providers who are renting out a100 b200.

LetsGetTechnicl 57 minutes ago||
And AI investors and stock market boosters are just going to accept OpenAI not having anything "new" to show for all their investments? What about replacing hardware once it's been burned out from constant high usage? Is it not odd to you that so many big AI deals get announced and never heard from again? What's the business reason for neoclouds buying GPU's from NVIDIA only for NVIDIA to then pay them to rent them back? How does this make any sense?
operatingthetan 4 hours ago|||
They immediately undercut their argument to the point that I'm not sure if they were being sarcastic.
Rekindle8090 3 hours ago|||
[dead]
ilia-a 8 hours ago|||
Yeah, it is a massive jump in price, hardly a "Flash" model anymore... I wonder if they'll release flash lite or something with a bit more affordable price point.
OakNinja 6 hours ago||
There’s already a flash lite tier since 2.5. Latest is 3.1 currently.
irthomasthomas 8 hours ago|||
And they are using this to power search answers?
CooCooCaCha 7 hours ago||
I bet the API pricing helps pay for search users
llm_nerd 8 hours ago|||
It might be temporary pricing given that 3.5 Flash is actually superior to the existing 3.1 Pro in almost all regards, so they're in a bit of a lurch as 3.1 Pro really doesn't make sense given that 3.5 Pro has been delayed a bit.
dzhiurgis 2 hours ago|||
I use Gemini models in Junie daily. When I need accuracy I switch to Gemini 3.1 Pro Preview (why it is still in preview?), but it burns thru credits leaving me topping up $5 every day. 3.1 Flash lite is just not accurate enough. 3 Flash is sweet spot just as Jetbrains suggests it is.

Maybe I'll look at Opus again, but it just was slower, much more expensive and worst at all - wasn't listening to you instructions.

SwellJoe 8 hours ago|||
That's a lot. DeepSeek v4 Flash is just over a tenth the price, and DeepSeek v4 Pro is roughly the same price (currently heavily discounted, but will be $1.74).

I mean, the benchmarks for Gemini 3.5 Flash are very strong, but at those prices it has to be. I guess the time of subsidized tokens from the big guys is slowly coming to an end.

copperx 5 hours ago||
They have said AI will be priced like a utility, meaning $100-300 per month or so.
verdverm 7 hours ago|||
At the same time, it is supposedly Gemini 3.1 Pro level at 3/4 the price

and far cheaper than comparable models, Gemini Pro is cheaper than Claude Sonnet (Anthropic still gets to charge a brand premium)

throwa356262 7 hours ago|||
Gemini 2.5 flash was the best Gemini model.

Not the most intelligent but perfect balance of cheap, fast and not-too-dumb.

m3kw9 7 hours ago||
just subscribe to the plan, cheaper
SXX 9 hours ago||

  > Create animated SVG of a frog on a boat rowing through jungle river. Single page self contained HTML page with SVG
3.5 Flash: Thinking Medium - 7516 tokens

https://gistpreview.github.io/?5c9858fd2057e678b55d563d9bff0...

3.5 Flash: Thinking High - 7280 tokens

https://gistpreview.github.io/?1cab3d70064349d08cf5952cdc165...

3.1 Pro - 28,258 tokens

https://gistpreview.github.io/?6bf3da2f80487608b9525bce53018...

Though 3.1 took 3 minutes of thinking to generate, but it only one that got animated movement.

SXX 9 hours ago||
Gemini 3.1 Flash Lite Thinking High - 2,526 tokens:

https://gistpreview.github.io/?3496285c5dac5ba10ebbc0b201a1a...

Gemini 2.5 Pro - 5,325 tokens:

https://gistpreview.github.io/?cc5e0fefeaaffecd228c16c95e736...

Gemini 2.5 Flash - 7,556 tokens:

https://gistpreview.github.io/?263d6058fe526a62b8f270f0620ec...

Gemma 4 31B IT - 3,261 tokens via AI Studio:

https://gistpreview.github.io/?858a42b96af864859a3b89508619d...

Gemma 4 26B A4B IT - 4,034 tokens via AI Studio:

https://gistpreview.github.io/?4adb7703897e0c6b583f9de928e4a...

SXX 8 hours ago|||
Gemma 4 E4B it via Edge Gallery on pixel phone:

https://gistpreview.github.io/?da742884e5e830ce71ee4db877519...

OFC this is just for fun, but nevertheless gave me working code on first try.

segmondy 4 hours ago|||
I'm surprised that, "they must have trained for it" camp is not here saying that rubbish.
franze 8 hours ago|||
Opus 4.7

https://claude.ai/public/artifacts/128ebe5a-add7-406a-9bce-6...

tasuki 7 hours ago||
Wow that's terrible. Any idea why?
lpa22 7 hours ago|||
Did you see the other ones? This is very good by comparison.
HDBaseT 5 hours ago||
Yeah, the oars being around (inverted) is very distracting but the other elements appear quaint and "accurate".
stingraycharles 5 hours ago|||
I think Anthropic optimizes less for visuals. Also, it’s not that terrible.
abtinf 9 hours ago|||
hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF @ Q6_K

8112 tokens @ 52.97 TPS, 0.85s TTFT

https://gistpreview.github.io/?7bdefff99aca89d1bc12405323bd4...

Full session: https://gist.github.com/abtinf/7bdefff99aca89d1bc12405323bd4...

Generated with LM Studio on a Macbook Pro M2 Max

https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6...

SXX 8 hours ago|||
Well, honestly this is quite impressive compared to 3.1 Flash Lite and 2.5 Pro. Considering that 2.5 Pro is actually quite good at generating massive amounts of code one shot.
svnt 7 hours ago|||
It isn’t animated at all for me?
kingstnap 3 hours ago|||
It is animated but the viewer is broken for some reason (tested Chrome latest windows).

This one works:

https://www.svgviewer.dev/s/04ipQgsU

SXX 7 hours ago|||
It is animated just no movement like on my 3.5 flash examples. Try different browser might be unless it iOS.
vtail 8 hours ago|||
Here is GPT 5.5 High thinking; I had to add a second follow up prompt "it's not animated though" as the first one was not animated.

https://gistpreview.github.io/?557f979c82701862bc26d24f10399...

hskalin 4 hours ago|||
Why is it fixated on the front perspective? Interesting choice though, because most humans (and seems like other LLMs too) would pick a side perspective
vtail 8 hours ago|||
Here is a GPT 5.5 Extra High with a modified instruction:

> Create animated SVG of a frog on a boat rowing through jungle river. Single page self contained HTML page with SVG. Use the Brave Browser to verifty that the image is indeed animated and looks like a proper rowing frog; iterate until you are satisfied with it.

It was able to discover and fix an animation bug, but the result is still far from perfect: https://gistpreview.github.io/?029df86d03bfe8f87df1e4d9ed2f6...

r0fl 1 hour ago|||
It’s shocking how much better 3.1 is than 3.5 flash

The benchmarks used don’t really give a full story

captn3m0 9 hours ago|||
All three links animate for me.
NitpickLawyer 9 hours ago||
I think they mean the boat is moving. In the flash ones the paddles are animated but the boat is stationary for me.
codazoda 9 hours ago||
The boat moves in all three for me
Fishkins 9 hours ago||
The boat itself rocks, but do you see the background changing to indicate the boat is progressing through the environment? I only see that in the 3.1 Pro example. I believe that's what the OP meant.
Manuel_D 9 hours ago||
I think this illustrates the problem with OP's prompt. If the goal is specifically to implement a scrolling background, this should have been in the prompt.
SXX 8 hours ago||
Yup. My bad. It was just first idea that come to my mind since I enjoy visually compare each new release with unique prompts.
krupan 8 hours ago|||
These are hilarious. 3.5 Flash Thinking High is the only one that is weirdly deformed (what is going on with the hat in 3.1 Pro??)
stingraycharles 5 hours ago|||
3.5 Flash definitely got the synth wave vibe preference.
wslh 9 hours ago|||
Can you try with a more complex story such as "three little pigs"? I tried but it created a storybook instead of the SVG animation. I am looking to partially imitate Godogen [1][2] which is really great, even for animations.

[1] https://github.com/htdt/godogen

[2] https://drive.google.com/file/d/1ozZmWcSwieZQG0muYjbj7Xjhhlz...

SXX 7 hours ago||
I think it's unreasonable to expect models generate complex stories in single prompt since they trained to be concise, but I tried. This is prompt on top of story with no control buttons request:

   Now think, plan how to tell this story in a cartoon, make scene outline and then generate SVG animation story for "Three Little Pigs" in self contained HTML page. Just single animation no control buttons.
Full prompt in gist comments: https://gist.github.com/ArseniyShestakov/ed9faa53604035005ca...

Actual results for models, one shot:

Gemini 3.5 Flash - Three Little Pigs - 9,050 tokens:

https://gistpreview.github.io/?ed9faa53604035005cae86c63c766...

Gemini 3.1 Pro - Three Little Pigs - 24,272 tokens:

https://gistpreview.github.io/?f506bbfd9b4459c8cd55d89605af8...

Gemini 3 Flash - Three Little Pigs - 5,350 tokens:

https://gistpreview.github.io/?f58eff069cf916031c97d560b0e35...

Gemma 4 31B IT - Three Little Pigs - 5,494 tokens:

https://gistpreview.github.io/?a3aa75abbe8fd7818b73f6fa55ee6...

Gemma 4 26B A4B IT - Three Iittle Pigs - 6,375 tokens:

https://gistpreview.github.io/?1e631caebeb54f9f0cd6d0e3d4d5e...

no-name-here 53 minutes ago|||
3.1 pro was pretty good among them. (iOS)
ZeWaka 6 hours ago|||
Wow, Gemini 3.5 Flash surprised me there.
abi 9 hours ago||
Your links are broken FYI.
John7878781 9 hours ago||
They work for me.
TacticalCoder 9 hours ago||
They do work here too.
OhMeadhbh 8 hours ago||
Am I really so old that when someone says "Flash" my immediate response is... "consider HTML5 instead" ??
nightski 8 hours ago||
Very little of what made the Flash culture so fun made its way into HTML5.
CobrastanJorji 6 hours ago|||
I dunno, the tools are kind of there. Browsers have canvases and JavaScript and SVGs and sound. The communities are around; they're just kind of dispersed. There's no one website that is THE place for fun stuff. Instead, there are dozens, and most of them suck.

There's still fun stuff, though. I stumbled upon this bit of insanity just yesterday: https://tykenn.itch.io/trees-hate-you. It would have fit in fabulously with the old Flash sites.

moritzwarhier 6 hours ago|||
Edit: looks like you linkes something created with Unity?

Not sure, I'm not versed in game dev. So maybe my point about creation tools is moot.

However, 3D content always seems very samey to me, in a way that cartoons and regular animation don't. So the rest of my comment should still express what I mean.

---

Flash had a WYSIWYG editor aimed at media creators who treat programming at best as an afterthought.

Flash was mostly about ease of tweening and extremely flexible vector graphics engine combined with an intuitive creation tool.

So the "Flash vs HTML/JS/SVG/CSS..." debate is not just about technical capabilities of the medium.

Of course there are many fun web apps in the browser, or as native apps, too. But Flash attracted all kinds of slightly nerdy people with cultural things to say, not just web devs with a lot of free time.

What "HTML5"/browser web technology doesn't offer is this intuitive, visual creation pipeline, and this kind of speaks for itself!

Also, I think the Flash "creator's" age is not separable from its time: using Flash wasn't trivial either.

There were just more people with interesting ideas, free time, and a wholistic talent for expressing their humor and ideas, combined with the curiosity and skill to learn using Flash (of course only as a licensed copy purchased from Macromedia).

People like this today are probably more often hyper-optimizing social media creators, and/or not terminally online.

In other words: I don't think the typical Newgrounds creator would have taken the time and effort to translate a stickman collage, meme, or other idea into a web app / animation.

---

And to add even more preaching: I think that "creating" things using AI produces exactly the opposite effect: feed it an original idea, and the result will be a regression to the mean.

Gigachad 5 hours ago|||
It's not quite the same but it seems the people who used to be publishing flash games are now making indie games on Steam. With modern dev tools and engines it's possible for one person to make what used to be a team effort before.

The whole "friendslop" genre is what replaced flash games.

sieabahlpark 7 hours ago|||
[dead]
pezgrande 6 hours ago|||
They were CPU killers but man those Flash websites were gorgeous (talking mostly about MU Online "private" servers)
winrid 5 hours ago||
It was probably the right call at the time with low bandwidth. Nowadays I bet flash would execute faster than most js heavy sites :D
guelo 4 hours ago||
It was not the right call, Steve Jobs was just a monopolist killing a competing platform and we're all worse off for it.
hedora 3 hours ago|||
I guess I'm slightly younger: I think "weights or it didn't happen"!
goatlover 7 hours ago|||
The Flash designer was really nice. One thing the web kind of set back was all the RAD tools from the 90s and 2000s.
OhMeadhbh 7 hours ago||
And there were some amazing RAD and prototyping tools in the 90s (mostly for DOS, but also for Windoze desktop apps.) You're right, we sort of gave up on the idea when everyone wanted to be seen as a "real" software engineer who knew how to sling Java on the back end.
_puk 7 hours ago|||
Lol. Young uns!

Flash, ah, ah, saviour of the universe. Flash, ah, ah, he'll save every one of us!

Every time I have heard the word flash for goodness knows how many years.

OhMeadhbh 7 hours ago||
If Google can reuse the "Flash" brand, I'm re-branding myself as "Meadhbh the Merciless."
wslh 3 hours ago||
Same here, and worst because in another thread users are generating animations.
gertlabs 1 hour ago||
Taking into account that this is a flash model, it's a strong release. It's very fast and frontier-ish for the price.

Raw intelligence is high for a flash model. But Google's problem has always been productization and tool use, whereas raw intelligence is always competitive. It does not look like they solved that with this release -- in fact, their tool use delta (the improvement in scores when given arbitrary tools and a harness) has actually regressed from some previous models.

Data at https://gertlabs.com/rankings

hmate9 7 hours ago||
I have google ai pro plan and tried antigravity with 3.5 flash but it used up all my quota in two prompts. If that is not a bug then it is seriously unusable.
quirino 7 hours ago||
Yesterday, or the day before, Google lowered the AI Pro quota from 33x standard usage to 4x.

From the talk on the Gemini subreddit it's severely lower than before. I'm likely canceling my AI Pro.

The update also broke the app for me. Editing a message crashes the app every time. I'm on a Pixel lol

HDBaseT 4 hours ago||
The crunch is real.

- The model is appox 3.3x cost. - The model is realistically almost 5x cost due to token usage - Google has TPUs to run this on (yet the cost) - Google has a lot more security and backup cash compared to all other AI companies, likely even combined (yet the cost)

We can continue moving the goal posts, but it seems we're at a bit of a wall. Costs are increasing, intelligence is improving, but the cost is rising drastically.

You'd think Google of all companies in the mix would be able to sustain lower costs with how integrated they are with TPU, Deepmind and effectively unlimited budget.

babl-yc 4 hours ago|||
I'm seeing this too.

API price for gemini-3.5-flash is 3x gemini-3-flash-preview so they might be throttling it 3x sooner. They should either drop API prices or not advertise AI Pro as supporting Antigravity.

https://ai.google.dev/gemini-api/docs/pricing#gemini-3.5-fla...

moral1ty 7 hours ago||
[dead]
lanewinfield 8 hours ago||
Gemini 3.5 Flash's 2000 token clocks aren't bad. https://clocks.brianmoore.com/
acters 4 hours ago|
Fascinating, kimi k2 has good clock too from my limited time being on the site.
nl 2 hours ago||
On my Agentic SQL benchmark it scores 19/25. That's... mediocre.

It means performs worse than 3.1 Flash Lite Preview (22/25), is slower (367s vs 142s) and is more expensive (75c vs 2c).

It is outperformed by Gemma4 26B-A4B in every way(!)

https://sql-benchmark.nicklothian.com/?highlight=google_gemi...

(Switch to the cost vs performance chart to see how far this is off the Pareto frontier)

reconnecting 8 hours ago|
Knowledge cutoff: January 2025

Latest update: May 2026

I have a very bad feeling about this lag.

SwellJoe 7 hours ago||
At least in some cases, there seems to be a move toward training on more synthetic data and strictly curated data, especially for smaller models where knowledge can't be extremely broad, because there just isn't enough room to store the world in tens or hundreds of gigabytes of model weights. So, to achieve higher quality reasoning, the training has to be focused and the data has to be very high quality and high density.

With strong tool use, it maybe doesn't even matter that the models are using older data. They can search for updated information. Though most models currently don't, without a little nudge in that direction.

Also, I believe the Qwen 3 series are all based on the same base model, with just fine-tuning/post-training to improve them on various metrics. Maybe everything in the Gemini 3 series is the same, and maybe they're concurrently training the Gemini 4 base model with updated knowledge as we speak.

reconnecting 7 hours ago||
> it maybe doesn't even matter that the models are using older data.

This actually really does matter. Otherwise, the model simply won't know about your product and will always suggest only a few market leaders.

Searching for information on the Internet became a jungle a decade ago, and to be visible you have to pay Google for sunlight. Now, we risk falling into real darkness — until some paid model eventually emerges. This might be the reason Google is fine with training data from 2024. If the top spot is reserved for whoever pays anyway, why bother?

SwellJoe 6 hours ago||
That's a different problem than I thought you were worried about. I wasn't considering the marketing angle, though that is certainly relevant and a risk to consider, especially when it comes to Google, whose primary businesses are ads and surveillance.
hosel 8 hours ago|||
Can you explain what you mean?
reconnecting 8 hours ago|||
LLM pre-training models risk being unable to be updated with data from after 2025, as much of it is corrupted with LLM-generated content. We might be locked into outdated knowledge, where only whitelisted sources decide what to include.

Taking into account the sometimes blind belief that 'LLMs know everything', the outcome could be very costly, especially for technologies and businesses unfortunate enough to emerge after 2025.

agnosticmantis 46 minutes ago|||
It may not be mainly or solely due to LLM pollution, but rather the fact that every publisher, (social) media company, newspaper, etc. clammed up and started charging (licensing) fees sometime in the last couple of years.

So maybe there's just not much openly available and new content worth training on that wasn't available prior to 2025.

neksn 6 hours ago||||
Considering all models can use search engines, is this really relevant?
Culonavirus 3 hours ago|||
This is not meant as an insult, but have you actually LLM/vibe coded anything that used a fast(-ish) moving library or framework? Try asking your favorite LLM with say Jan 2025 knowledge cutoff (or pretraining data cutoff, whatever you want to call it) to work on something using a framework that had a big rewrite later that year (which would make it one year old now, which is like ages in the LLM coding era)... It's a nightmare full of wrestling with the LLM when you try to tell it the version of the framework and that it changed a lot from the previous version and yadda yadda long story short down the thread when context runs out and/or is compressed it begins to forget detailed instructions and just falls back to pulling out old patterns it "remembers" from pretraining. And so you need to constantly remind it what you work with and "oh hey this doesnt work because we're working with react router v7 in framework mode, remember? not react router v6". Or try to use the latest non-lts/breaking version of a library, at first it looks it up online, but again as you get deeper into the weeds and little details, the struggle begins.

So, as far as I'm concerned, training cutoff is still a big deal.

dinfinity 2 hours ago||
> It's a nightmare full of wrestling with the LLM when you try to tell it the version of the framework and that it changed a lot from the previous version and yadda yadda

Tip: Add a default instruction to look at the actial downloaded source code of the dependencies used (assuming you're not dealing with closed source dependencies). Have the agent treat it as your own (readonly) source code instead of relying on model training data and possibly mismatching documentation on the web. Then it just greps for the exact function signatures and reads the file based documentation.

reconnecting 6 hours ago|||
Until they prefer not to search. Let me explain using the example of the open-source security framework (1) our team is working on.

If you ask Gemini what you should use to integrate fraud prevention or account takeover protection into your product, there will be no mention of our open-source project. Five years in development, 1.3k stars, over 140 pull requests — all this isn't enough to make it into the training data. From this perspective, any technology that emerges after 2024 is simply invisible to LLMs.

The answer is: without being in the training data, LLMs basically don't understand what they're searching for.

1. https://github.com/tirrenotechnologies/tirreno

ordersofmag 5 hours ago||
I just put the terribly generic query "what tools would you recommend to integrate fraud prevention or account takeover protection into my product" into both Claude (Sonnet) and Gemini (3.1 Pro) via the standard web interface and both took the first step of searching the web. That's consistent with my past experience -- the usual harnesses typically will search the web in cases where I might expect/want them to. Now whether you product has good web visibility or not in those searches and how the LLM's weigh the relative merits of open-source tools versus commercial offerings in deciding what to highlight in their responses is a different issue. As is the change in what constitutes effective SEO in an era where bots, rather then human eyes are the proximal important target. But I don't think the core issue with folks finding your products is the move away from user-driven search toward using models with out-of-date training cutoffs.

FWIW while neither model included your product in it's initial response, when I followed up with "what about open-source" both did another search and Claude's response included your tool....

Pikamander2 5 hours ago|||
But ChatGPT has been popular since early 2023, and even before it there was no shortage of low-quality content on the web.

If anything, this model being trained up to 2025 is a positive sign that the "circular LLM training" problem hasn't (yet) become unmanagable.

The year-long delay is probably just due to how long it takes to test/refine a cutting-edge model. It's surely possible to train one faster, but Google wouldn't want to release a new model unless it's going to top the usual benchmarks.

djeastm 5 hours ago||
Looking at token usage at places like OpenRouter as a proxy for overall production we're looking at exponential growth in AI-created content. Weekly token usage there has tripled just in the past 3 months.
nemomarx 8 hours ago|||
It might indicate core model training and pre training is really slowing down?
mixtureoftakes 8 hours ago||
also parsing is harder + so much more of the new data is being generated by ai itself.

still the cutoff is very much concerning and inconvenient

yoda7marinated 8 hours ago|||
I thought that was a choice that Google made?
verdverm 7 hours ago||
you really shouldn't have them pulling facts from their weights, they need grounding from real data sources
More comments...