Posted by kevinsimper 13 hours ago
Is this normal humans kicking the tires on a new model, or a few whales doing serious benchmarks?
Open-weight: Good enough for the majority of tasks, and I'm willing to spend a bit more time and effort steering towards my desired result.
4.7 is much better. But perception is a funny thing, once you think something is bad you start looking for it everywhere.
The setup I had to do was important and I had to compile koboldcpp with a few special params for my hardware, I mostly just had Claude figure it out. I don't remember everything I did now but it was very slow and would often stop mid task, it seems it was mostly a parsing issue. It made the model seem broken/dumb, but once I had all that settled I actually am able to use this how I use Claude Code. Disclaimer, I am pretty explicit with requirements, I imagine this fails more when you leave it to figure out things on its own but for my flow its pretty rad.
Currently setting it up as an automated agent now to pull Trello cards, create PRs for them, and move the card to be reviewed.
Command I am using to run: python koboldcpp.py \ --port 61514 --quiet --multiuser --gpulayers 999 --contextsize 262144 --quantkv 2 \ --usecublas normal --threads 4 --jinja --jinja_tools --jinja_kwargs '{"enable_thinking":true, "preserve_thinking":false}' \ --skiplauncher --model /data/models/Qwen3.6-27B-Q5_K_M.gguf --smartcache 5
It's very capable on almost any coding task I've thrown at it, and very good for easy-to-medium hard scripts, new code bases.
It struggles on some complex tasks in larger code bases, e.g. using to debug and fix bugs in llama.cpp it gets close to working code but often introduces errors. For such tasks its still very useful as a search/explore tool and drafting fixes.
Good balance of intelligence and speed.
I had a Google Pro account that I inherited from buying a Pixel 9 XL - it's free for a year after a flagship Pixel phone purchase. After a year they started charging for it, and i tolerated it, because Flash was usable in Antigravity for dumb auxiliary tasks that I did not want to waste GPT/Opus on. It had a separate generous quota from Gemini 3.1 Pro. Now with Flash 3.5 they combined the quotas with Pro, such that on a Google pro account you can work 4-5 hours per week in Flash. And by the way, 3.1 Pro is useless for programming, compared to Codex/Opus
I think they envision Pro plan as "just a taste of AI, enough to lure folks into the Ultra plan" but that won't work for me when Codex is half the price and DeepSeek 4 Flash is 1/10 of their price per task.
So I'll downgrade just enough to keep my Google Drive space. And use DeepSeek 4 as workhorse plus Codex or Copilot for advanced stuff.
https://marketplace.visualstudio.com/items?itemName=sst-dev....
It adds a button to VSCode to open a tab with opencode loaded. It's a bit better than just opening the CLI because it has some vscode integration.
With their $10/mo opencode go plan: https://opencode.ai/go
For my use it's about endless use of DS4 Flash on high setting. I find high better than max because it's less chatty.
The best thing is the speed. So many tokens per second.
edit: This is how it looks in action https://i.imgur.com/RNDXr07.png
I haven't tested openrouter but I expect it to be slightly less cheap because it charges per token and opencode Go plan is a $10/mo fixed price model. Economies of scale leads me to think that for heavy use, openrouter will be more costly since opencode Go can subside heavy users like me with money from light users (just like gyms do with people that pay but barely use it).
With that said, I find vscode native copilot chat more pleasant to use, but also more laggy for large sessions.
opencode configuration is less polished and you'll have to grok around for some things. For example opencode CTRL+p conflicts with VSCode CTRL+p. I changed opencode to use Ctrl+L instead.