Posted by MallocVoidstar 1 day ago
Card: https://deepmind.google/models/model-cards/gemini-3-1-pro/
ETA: They apparently wiped out everyone's chats (including mine). "Our engineering team has identified a background process that was causing the missing user conversation metadata and has successfully stopped the process to prevent further impact." El Mao.
opencode models --refresh
Then /models and choose Gemini 3.1 ProYou can use the model through OpenCode Zen right away and avoid that Google UI craziness.
---
It is quite pricey! Good speed and nailed all my tasks so far. For example:
@app-api/app/controllers/api/availability_controller.rb
@.claude/skills/healthie/SKILL.md
Find Alex's id, and add him to the block list, leave a comment
that he has churned and left the company. we can't disable him
properly on the Healthie EMR for now so
this dumb block will be added as a quick fix.
Result was: 29,392 tokens
$0.27 spent
So relatively small task, hitting an API, using one of my skills, but a quarter. Pricey!I get the impression that Google is focusing on benchmarks but without assessing whether the models are actually improving in practical use-cases.
I.e. they are benchmaxing
Gemini is "in theory" smart, but in practice is much, much worse than Claude and Codex.
However, I heavily use Gemini in my daily work and I think it has its own place. Ultimately, I don't see the point of choosing the one "best" model for everything, but I'd rather use what's best for any given task.
Which cases? Not trying to sound bad but you didn't even provide of cases you are using Claude\Codex\Gemini for.
Gemini can go off the rails SUPER easily. It just devolves into a gigantic mess at the smallest sign of trouble.
For the past few weeks, I've also been using XML-like tags in my prompts more often. Sometimes preferring to share previous conversations with `<user>` and `<assistant>` tags. Opus/Sonnet handles this just fine, but Gemini has a mental breakdown. It'll just start talking to itself.
Even in totally out-of-the-ordinary sessions, it goes crazy. After a while, it'll start saying it's going to do something, and then it pretends like it's done that thing, all in the same turn. A turn that never ends. Eventually it just starts spouting repetitive nonsense.
And you would think this is just because the bigger the context grows, the worse models tend to get. But no! This can happen well below even the 200.000 token mark.
For development I tend to use Antigravity with Sonnet 4.5, or Gemini Flash if it's about a GUI change in React. The layout and design of Gemini has been superior to Claude models in my opinion, at least at the time. Flash also works significantly faster.
And all of it is essentially free for now. I can even select Opus 4.6 in Antigravity, but I did not yet give it a try.
Agree Gemini as a model is fairly incompetent inside their own CLI tool as well as in opencode. But I find it useful as a research and document analysis tool.
The models are all close enough on the benchmarks and I think people are attributing too much difference in the agentic space to the model itself. I strongly believe the difference is in all the other stuff, which is why Antropic is far ahead of the competition. They have done great work with Claude Code, Cowork, and their knowledge share through docs & blog, bar none on this last point imo.
https://www.google.com/appsstatus/dashboard/incidents/nK23Zs...
Im biased I dont trust either of them, so perhaps im just hard looking for the hate and attributing all the positive stuff to advertising.