April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Posted by greenstevester 3 days ago

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini(gist.github.com)

275 points | 107 commentspage 3

jiusanzhou 3 days ago|

[dead]

volume_tech 3 days ago||

[dead]

kanehorikawa 3 days ago||

[dead]

greenstevester 3 days ago||

[flagged]

krzyk 3 days ago|

By desk you mean that "Mac mini"? Because it is pricey. In my country it is 1000 USD (from Apple for basic M4 with 24GB). My desk was 1/5th of that price.

And considering that this Mac mini won't be doing anything else is there a reason why not just buy subscription from Claude, OpenAI, Google, etc.?

Are those open models more performant compared to Sonnet 4.5/4.6? Or have at least bigger context?

lambda 3 days ago|||

Right now, open models that run on hardware that costs under $5000 can get up to around the performance of Sonnet 3.7. Maybe a bit better on certain tasks if you fine tune them for that specific task or distill some reasoning ability from Opus, but if you look at a broad range of benchmarks, that's about where they land in performance.

You can get open models that are competitive with Sonnet 4.6 on benchmarks (though some people say that they focus a bit too heavily on benchmarks, so maybe slightly weaker on real-world tasks than the benchmarks indicate), but you need >500 GiB of VRAM to run even pretty aggressive quantizations (4 bits or less), and to run them at any reasonable speed they need to be on multi-GPU setups rather than the now discontinued Mac Studio 512 GiB.

The big advantage is that you have full control, and you're not paying a $200/month subscription and still being throttled on tokens, you are guaranteed that your data is not being used to train models, and you're not financially supporting an industry that many people find questionable. Also, if you want to, you can use "abliterated" versions which strip away the censoring that labs do to cause their models to refuse to answer certain questions, or you can use fine-tunes that adapt it for various other purposes, like improving certain coding abilities, making it better for roleplay, etc.

zozbot234 3 days ago||

You don't need that much VRAM to run the very largest models, these are MoE models where only a small fraction is being computed with at any given time. If you plan to run with multiple GPUs and have enough PCIe lanes (such as with a proper HEDT platform) CPU-GPU transfers start to become a bit less painful. More importantly, streaming weights from disk becomes feasible, which lets you save on expensive RAM. The big labs only avoid this because it costs power at scale compared to keeping weights in DRAM, but that aside it's quite sound.

lambda 3 days ago||

While you can run with weights in RAM or even disk, it gets a lot slower; even though on any given token a fraction of the weights are used, that can change with each token, so there is a lot of traffic to transfer weights to the GPU, which is a lot slower than if it's directly in GPU RAM. And even more slower if you stream from disk. Possible, yes, and maybe OK for some purposes, but you might find it painfully slow.

zhongwei2049 3 days ago|||

I have the same setup (M4 Pro, 24GB). The e4b model is surprisingly snappy for quick tasks. The full 26B is usable but not great — loading time alone is enough to break your flow.

Re: subscriptions vs local — I use both. Cloud for the heavy stuff, local for when I'm iterating fast and don't want to deal with rate limits or network hiccups.

mark_l_watson 3 days ago|

The article has a few good tips for using Ollama. Perhaps it should note that the Gemma 4 models are not really trained for strong performance with coding agents like OpenCode, Claude Code, pi, etc. The Gemma 4 models are excellent for applications requiring tool use, data extraction to JSON, etc. I asked Gemini Pro about this earlier and Gemini Pro recommended qwen 3.5 models specifically for coding, and backed that up with interesting material on training. This makes sense, and is something that I do: use strong models to build effective applications using small efficient models.

Aurornis 3 days ago||

> I asked Gemini Pro about this earlier and Gemini Pro recommended qwen 3.5 models specifically for coding, and backed that up with interesting material on training.

The Gemma models were literally released yesterday. You can’t ask LLMs for advice on these topics and get accurate information.

Please don’t repeat LLM-sourced answers as canonical information

zozbot234 3 days ago|||

It's not just LLM sourced though, folks have literally tried this after the release with the 26A4B model and it wasn't very good. Maybe the dense ~31B model is worthwhile though.

Aurornis 3 days ago||

Many Gemma implementations are or were broken on launch day. The first attempts to fix llama.cpp’s tokenizer were merged hours ago.

Everyone hated Qwen3.5 at launch too because so many implementations were broken and couldn’t do tool calling.

You need to ignore social media “I tried this and it sucks” echo chambers for new model releases.

mark_l_watson 3 days ago||

I agree with your criticism. I should have simply said that I had good results with gemma 4 tool use, and agentic coding with gemma 4 didn’t yet work well for me.

mark_l_watson 3 days ago||||

I spent two hours doing my own research before asking for Gemini’s analysis, which reinforced my own opinion that the gemini models historically have not been trained and target for agentic coding use.

Have you tried using the new Gemma 4 models with agentic coding tools?If you do, you might end up agreeing with me.

SparkyMcUnicorn 3 days ago||

I've found my research on certain topics like this becoming less reliable these days, compared to just trying it out to form an opinion.

mark_l_watson 3 days ago||

I wasn’t very clear, sorry. By my ‘own research’ I meant spending 90 minutes experimenting with Gemma 4 models for tool use (good results!) and a half hour using with pi and OpenCode (I didn’t get good results, yet.)

armchairhacker 3 days ago|||

LLMs can search the web. Although I don’t trust the LLM (or someone repeating its claim) without quotes and URLs to where it got the information.

renewiltord 3 days ago||

Oh yeah absolute genius. I asked GPT-2 about Claude Opus 4.6 and it said “this is not a recommendation. You might get some benefits from Opus… but this is not what you want”. Damn, real wisdom from the OG there. What a legend