Top
Best
New

Posted by mraniki 3/31/2025

Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison(composio.dev)
483 points | 328 commentspage 4
stared 3/31/2025|
At this level, it is very contextual - depending on your tools, prompts, language, libraries, and the whole code base. For example, for one project, I am generating ggplot2 code in R; Claude 3.5 gives way better results than the newer Claude 3.7.

Compare and contrast https://aider.chat/docs/leaderboards/, https://web.lmarena.ai/leaderboard, https://livebench.ai/#/.

asdf6969 3/31/2025||
Does anyone know guides to integrate this with any kind of big co production application? The examples are all small toy projects. My biggest problems are like there’s 4 packages I need to change and 3 teams and half a dozen micro services are involved.

Does any LLM do this yet? I want to throw it at a project that’s in package and micro service hell and get a useful response. Some weeks I spend almost all my time cutting tickets to other teams, writing documents, and playing politics when the other teams don’t want me to touch their stuff. I know my organization is broken but this is the world I live in.

eugenekolo 3/31/2025||
It's definitely an attempt to compare models, and Gemini clearly won in the tests. But, I don't think the tests are particularly good or showcasing. It's generally an easy problem to ask AI to give you greenfields JS code for common tasks, and Leetcode's been done 1000 times on Github and stackoverflow, so the solutions are all right there.

I'd like to see tests that are more complicated for AI things like refactoring an existing codebase, writing a program to auto play God of War for you, improving the response time of a keyboard driver and so on.

ldjkfkdsjnv 3/31/2025||
I've been coding with both non stop the last few days, gemini 2.5 pro is not even close. For complicated bug solving, o1 pro is still far ahead of both. Sonnet 3.7 is best overall
diggan 3/31/2025||
I think O1 Pro Mode is so infrequently used by others (because of the price) so I've just started added "besides O1 Pro Mode, if you have access" in my head when someone says "This is the best available model for X".

It really is miles ahead of anything else so far, but also really pricey so makes sense some people try to find something close to it with much lower costs.

ldjkfkdsjnv 3/31/2025||
Yeah its not even close. In my mind, the 200$ a month could be 500 and I would still pay for it. There are many technical problems I have ran into, where I simply would not have solved the problem without it. I am building more complicated software than I ever have, and I have 10+ years of engineering experience in big tech
AJ007 3/31/2025||
If you are in a developing country and making $500-$1000 a month doing entry level coding work then $200 is crazy. On the other hand, your employment at this point is entirely dependent on your employer having no idea what is going on, or being really nice to you. I've also heard complaints from people, in the United States, about not wanting to pay $20 a month for ChatGPT. If the work you are doing is that low value, you probably shouldn't be on a computer at all.
ldjkfkdsjnv 3/31/2025||
Yeah its funny because I know I could hire someone off upwork. But I prefer to just tell the model what to code and integrate its results, over telling another engineer what to do.
uxx 3/31/2025||
agreed.
nprateem 3/31/2025||
Sometimes these models get tripped up with a mistake. They'll add a comment to the code saying "this is now changed to [whatever]" but it hasn't made the replacement. I tell it it hasn't made the fix, it apologises and does it again. Subsequent responses lead to more profuse apologies with assertions it's definitely fixed it this time when it hasn't.

I've seen this occasionally with older Claude models, but Gemini did this to me very recently. Pretty annoying.

mvdtnz 3/31/2025||
I must be missing something about Gemini. When I use the web UI it won't even let me upload source code files directly. If I manually copy some code into a directory and upload that I do get it to work, but the coding output is hilariously bad. It produces ludicrously verbose code that so far for me has been 200% wrong every time.

This is on a Gemini 2.5 Pro free trial. Also - god damn is it slow.

For context this is on a 15k LOC project built about 75% using Claude.

jstummbillig 3/31/2025||
This has not been my experience using it with Windsurf, which touches on an interesting point: When a tool has been optimized around one model, how much is it inhibiting another (newly released) model and how much adjustment is required to take advantage of the new model? Increasingly, as tools get better, we will not directly interact with the models. I wonder how the tool makers handle this.
benbojangles 3/31/2025||
Don't know what the fuss is about over a dino jump game, Claude made me a flappy bird esp32 game last month in one go: https://www.instagram.com/reel/DGcgYlrI_NK/?utm_source=ig_we...
larodi 3/31/2025||
Funny how the "give e Dinosaur game" from 'single prompt' is translates into FF's dinosaur 404 not found game.
uxx 3/31/2025|
Gemini takes parts of code and just writes (same as before) even when i ask it to provide full code. which for me is deal breaker
HarHarVeryFunny 3/31/2025|
Yeah - I tried Gemini 2.0 Flash a few week ago, and while the model itself is decent this was very annoying. It'd generate full source if I complained, but then next change would go back to "same as before" ... over and over ...
uxx 3/31/2025||
yes its insane.
More comments...