Posted by MallocVoidstar 10 hours ago
Card: https://deepmind.google/models/model-cards/gemini-3-1-pro/
1. unreliable in GH copilot. Lots of 500 and 4XX errors. Unusable in the first 2 months
2. not available in vertex ai (europe). We have requirements regarding data residency. Funny enough anthropic is on point with releasing their models to vertex ai. We already use opus and sonnet 4.6.
I hope google gets their stuff together and understands that not everyone wants/can use their global endpoint. We'd like to try their models.
It's only February...
How?
It's a bit hard to trick reasoning models, because they explore a lot of the angles of a problem, and they might accidentally have an "a-ha" moment that leads them on the right path. It's a bit like doing random sampling and stumbling upon the right result after doing gradient descent from those points.
I am trying to think what's the best way to give most information about how the AI models fail, without revealing information that can help them overfit on those specific tests.
I am planning to add some extra LLM calls, to summarize the failure reason, without revealing the test.
I'd rate it between haiku 4.5 (also pretty good for a price) and sonnet. Closer to sonnet.
Sure, if I am not cost-sensitive I'd run everything in opus 4.6 but alas.