Top
Best
New

Posted by jeffmcjunkin 9 hours ago

Google releases Gemma 4 open models(deepmind.google)
1075 points | 329 commentspage 5
stefs 3 hours ago|
i get a lot of tool call errors with gemma-4-26b-a4b, because the tokens don't seem to match up.
vigneshj 2 hours ago||
Great one to have
0xbadcafebee 7 hours ago||
Gemma 3 models were pretty bad, so hopefully they got Gemma 4 to at least come close to the other major open weights
nolist_policy 7 hours ago|
Bad at coding. Good for everything else.
flakiness 9 hours ago||
It's good they still have non-instruction-tuned models.
james2doyle 9 hours ago||
Hmm just tried the google/gemma-4-31B-it through HuggingFace (inference provider seems to be Novita) and function/tool calling was not enabled...
james2doyle 9 hours ago||
Yeah you can see here that tool calling is disabled: https://huggingface.co/inference/models?model=google%2Fgemma...

At least, as of this post

linolevan 9 hours ago||
Hosted on Parasail + Google (both for free, as of now) themselves, probably would give those a shot
rvz 9 hours ago||
Open weight models once again marching on and slowly being a viable alternative to the larger ones.

We are at least 1 year and at most 2 years until they surpass closed models for everyday tasks that can be done locally to save spending on tokens.

echelon 9 hours ago|
> We are at least 1 year and at most 2 years until they surpass closed models for everyday tasks that can be done locally to save spending on tokens.

Until they pass what closed models today can do.

By that time, closed models will be 4 years ahead.

Google would not be giving this away if they believed local open models could win.

Google is doing this to slow down Anthropic, OpenAI, and the Chinese, knowing that in the fullness of time they can be the leader. They'll stop being so generous once the dust settles.

ma2kx 8 hours ago|||
I think it will be less of a local versus cloud situation, but rather one where both complement each other. The next step will undoubtedly be for local LLMs to be fast and intelligent enough to allow for vocal conversation. A low-latency model will then run locally, enabling smoother conversations, while batch jobs in the cloud handle the more complex tasks.

Google, at least, is likely interested in such a scenario, given their broad smartphone market. And if their local Gemma/Gemini-nano LLMs perform better with Gemini in the cloud, that would naturally be a significant advantage.

pxc 4 hours ago||||
If they pass what closed models today can do by much, they'll be "good enough" for what I want to do with them. I imagine that's true for many people.
jimbokun 8 hours ago||||
But at that point, won’t there be very few tasks left where the average user can discern the difference in quality for most tasks?
pixl97 9 hours ago|||
I mean, correct, but running open models locally will still massively drop your costs even if you still need to interface with large paid for models. Google will still make less money than if they were the only model that existed at the end of the day.
virgildotcodes 8 hours ago||
Downloaded through LM Studio on an M1 Max 32GB, 26B A4B Q4_K_M

First message:

https://i.postimg.cc/yNZzmGMM/Screenshot-2026-04-03-at-12-44...

Not sure if I'm doing something wrong?

This more or less reflects my experience with most local models over the last couple years (although admittedly most aren't anywhere near this bad). People keep saying they're useful and yet I can't get them to be consistently useful at all.

solarkraft 8 hours ago||
Wow, just like its larger brother!

I had a similarly bad experience running Qwen 3.5 35b a3b directly through llama.cpp. It would massively overthink every request. Somehow in OpenCode it just worked.

I think it comes down to temperature and such (see daniel‘s post), but I haven’t messed with it enough to be sure.

flux3125 7 hours ago||
You're not doing anything wrong, that's expected
gunalx 6 hours ago||
We didnt get deepseek v4, but gemma 4. Cant complain.
DeepYogurt 8 hours ago||
maybe a dumb question but what what does the "it" stand for in the 31B-it vs 31B?
bigyabai 8 hours ago|
Instruction Tuned. It indicates that thinking tokens (eg <think> </think>) are not included in training.
flux3125 7 hours ago|||
That’s not what it means. "-it" just indicates the model is instruction-tuned, i.e. trained to follow prompts and behave like an assistant. It doesn’t imply anything about whether thinking tokens like <think>....</think> were included or excluded during training. Thats a separate design choice and varies by model.
DeepYogurt 7 hours ago||
What does that mean for a user of the model? Is the "-it" version more direct with solutions or something?
petu 4 hours ago|||
It means that model was tuned to to act as chat bot. So write a reply on behalf of assistant and stop generating (by inserting special "end of turn" token to signal inference engine to stop generation).

Base model (without instruction/chat tuning) just generates text non stop ("autocomplete on steroids") and text is not necessarily even formatted as chat -- most text in training data isn't dialogue, after all.

nolist_policy 6 hours ago|||
Use the it versions. The other versions are base models without post-training. E.g. base models are trained to regurgitate raw wikipedia, books, etc. Then these base models are post-trained into instruction-tuned models where they learn to act as a chat assistant.
daveguy 7 hours ago|
Fyi, it took me a while to find the meaning of the "-it" in some models. That's how Google designates "instruction tuned". Come on Google. Definite your acronyms.
More comments...