Spending Too Much Money on a Coding Agent

Posted by GavinAnderegg 5 days ago

Spending Too Much Money on a Coding Agent(allenpike.com)

152 points | 176 commentspage 3

suralind 2 days ago|

How does GitHub Copilot stack against API access directly from OpenAI, etc.? Is it faster to use API keys than Copilot?

butlike 3 days ago||

I love how paying for prompts stuck. Like, if someone's going to do your homework for you, they should get compensated.

v5v3 2 days ago||

No need to use the most expensive models for every query? Use it for the ones the cheaper models don't do well.

logifail 2 days ago|

Q: Can you tell in advance whether your query is one that's worth paying more for a better answer?

v5v3 2 days ago||

Most programmers are not asking ai to re-write the whole app or convert C to Rust.

You wouldn't gain anything from asking the most expensive model to adjust some css.

suninsight 2 days ago||

So what we do at NonBioS.ai is to use a cheaper model to do routine tasks, but switch to a higher thinking model seamlessly if the agent get stuck. Its most cost efficient, and we take that switching cost away from the engineer.

But broadly agree to the argument of the post - just spending more might still be worth it.

nickjj 2 days ago||

Serious question, how do you justify paying for any of this without feeling like it's a waste?

I occasionally use ChatGPT (free version without logging in) and the amount of times it's really wrong is very high. Often times it takes a lot of prompting and feeding it information from third party sources for it to realize it has incorrect information and then it corrects itself.

All of these prompts would be using money on a paid plan right?

I also used Cursor (free trial on their paid plan) for a bit and I didn't find much of a difference. I would say whatever back-end it was using was possibly worse. The code it wrote was busted and over engineered.

I want to like AI and in some cases it helps gain insight on something but I feel like literally 90% of my time is it prodiving me information that straight up doesn't work and eventually it might work but to get there is a lot of time and effort.

BeetleB 2 days ago||

Try with serious models. Here's what I would suggest:

1. Go to https://aider.chat/docs/leaderboards/ and pick one of the top (but not expensive) models. If unsure, just pick Gemini 2.5 Pro (not Flash).

2. Get API access.

3. Find a decent tool (hint: Aider is very good and you can learn the basics in a few minutes).

4. Try it on a new script/program.

5. (Only after some experience): Read people's detailed posts describing how they use these tools and steal their ideas.

Then tell us how it went.

abdullahkhalids 2 days ago|||

Depends on how much you use. I use AI to think through code and other problems, and write the dumb parts of code. Claude definitely works much better than the free offerings. I use OpenRouter [1] and spend only a couple of dollar per month on AI usage. It's definitely worth it.

[1] https://openrouter.ai No affiliation

jonfw 2 days ago|||

The AI agents that run on your machine where they have access to the code, and tools to browse/edit the code, or even run terminal commands are much more powerful than a simple chatbot.

It took some time for me to learn how to use agents, but they are very powerful once you get the hang of it.

josefresco 2 days ago||

> much more powerful than a simple chatbot

Claude Pro + Projects is a good middle ground between the two. Things didn't really "click" for me as a non-developer until I got access to both.

chis 2 days ago|||

I can't believe people are still writing comments like this lol how can it be

zzzeek 2 days ago||

I think it's a serious question because something really big is being missed here. There seem to be very different types of developers out there and/or working on very different kinds of codebases. Hypothetically, maybe you have devs or specific contexts where the dev can just write the code really fast where having to explain it to a bot is more time consuming, vs. devs /contexts where lots of googling and guessing goes on and it's easier to get the AI to just show you how to do it.

I'm actually employer mandated to continue to try/use AI bots / agents to help with coding tasks. I'm sort of getting them to help me but I'm still really not being blown away and still tending to prefer not to bother with them with things I'm frequently iterating on, they are more useful when I have to learn some totally new platform/API. Why is that? do we think there's something wrong with me?

vineyardmike 2 days ago||

> I'm actually employer mandated to continue to try/use AI bots / agents to help with coding tasks

I think a lot of this comes down to the context management. I've found that these tools work worse at my current employer than my prior one. And I think the reason is context - my prior employer was a startup, where we relied on open source libraries and the code was smaller, following public best practices regarding code structure in Golang and python. My current employer is much bigger, with a massive monorepo of custom written/forked libraries.

The agents are trained on lots of open source code, so popular programming languages/libraries tend to be really well represented, while big internal libraries are a struggle. Similarly smaller repositories tend to work better than bigger ones, because there is less searching to figure out where something is implemented. I've been trying some coding agents with my current job, and they spend a lot more time searching through libraries looking to understand how to implement or use something if it relies on an internal library.

I think a lot of these struggles and differences are also present with people, but we tend to discount this struggle because people are generally good at reasoning. Of course, we also learn from each task, so we improve over time, unlike a static model.

benbayard 2 days ago|||

I'd try out cursor with either o3 or Claude 4 Opus. The free version of ChatGPT and Claude in Cursor are much better. That's also what this article claims and is true in my experience.

vineyardmike 2 days ago|||

> Serious question, how do you justify paying for any of this without feeling like it's a waste?

I would invert the question, how can you think it's a waste (for OP) if they're willing to spend $1000/mo on it? This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.

> free version... the amount of times it's really wrong is very high... it takes a lot of prompting and feeding it information from third party

Respectfully, you're using it wrong, and you get what you paid for. The free versions are obviously inferior, because obviously they paywall the better stuff. If OP is spending $50/day, why would the company give you the same version for free?

The original article mentions Cursor. With (paid) cursor, the tool automatically grabs all the information on behalf of the user. It will grab your code, including grepping to find the right files, and it will grab info from the internet (eg up to date libraries, etc), and feed that into the model which can provide targeted diffs to update just select parts of a file.

Additionally, the tools will automatically run compiler/linter/unit tests to validate their work, and iterate and fix their mistakes until everything works. This write -> compile -> unit test -> lint loop is exactly what a human will do.

nickjj 2 days ago|||

> Respectfully, you're using it wrong, and you get what you paid for.

I used the paid (free trial) version of Cursor to look at Go code. I used the free version of ChatGPT for topics like Rails, Flask, Python, Ansible and various networking things. These are all popular techs. I wouldn't describe either platform as "good" if we're measuring good by going from an idea to a fully working solution with reasonable code.

Cursor did a poor job. The code it provided was mega over engineered to the point where most of the code had to be thrown away because it missed the big picture. This was after a lot of very specific prompting and iterations. The code it provided also straight up didn't work without a lot of manual intervention.

It also started to modify app code to get tests to pass when in reality the test code was the thing that was broken.

Also it kept forgetting things from 10 minutes ago and repeating the same mistakes. For example when 3 of its solutions didn't work, it started to go back and suggest using the first solution that was confirmed to not work (and it even output text explaining why it didn't work just before).

I feel really bad for anyone trusting AI to write code when you don't already have a lot of experience so you can keep it in check.

So far at best I barely find it helpful for learning the basics of something new or picking out some obscure syntax of a tool you don't well after giving it a link to the tool's docs and source code.

BeetleB 2 days ago||

> I feel really bad for anyone trusting AI to write code when you don't already have a lot of experience so you can keep it in check.

You definitely should be skilled in your domain to use it effectively.

pxc 2 days ago||||

> This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.

If someone spends a lot of money on something but they don't derive commensurate value from that purchase, they will experience cognitive dissonance proportional to that mismatch. But ceasing or reversing such purchases are only some of the possibilities for resolving that dissonance. Another possibility is adjusting one's assessment of the value of that purchase. This can be subconscious and automatic, but it an also involve validation-seeking behaviors like reading positive/affirming product reviews.

In this present era of AI hype, purchase-affirming material is very abundant! Articles, blog posts, interviews podcasts, HN posts.. there's lots to tell people that it's time to "get on board", to "invest in AI" both financially and professionally, etc.

How much money people have to blow on experiments and toys probably makes a big difference, too.

Obviously there are limits and caveats to this kind of distortion. But I think the reality here is a bit more complicated than one in which we can directly read the derived value from people's purchasing decisions.

klank 2 days ago|||

> This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.

This is not born out in my personal experience at all. In my experience, both in the physical and software tool worlds, people are incredibly emotional about their tools. There are _deep_ fashion dynamics within tool culture as well. I mean, my god, editors are the prima donna of emotional fashion running roughshod over the developer community for decades.

There was a reason it was called "Tool Time" on Home Improvement.

throwaway984393 2 days ago||

[dead]

delduca 2 days ago||

I just pay $20/month on ChatGPT and spend the entire day coding with its help, no need to pay for tokens, no need to integrate it on your IDE.

deadbabe 3 days ago|

I find it kind of boggling that employers spend $200/month to make employees lives easier, for no real gain.

That’s right. Productivity does go up, but most of these employees aren’t really contributing directly to revenue. There is no code to dollar pipeline. Finishing work faster means some roadmap items move quicker, but they just move quicker toward true bottlenecks that can’t really be resolved quickly with AI. So the engineers sit around doing nothing for longer periods of time waiting to be unblocked. Deadlines aren’t being estimated tighter, they are still as long as ever.

Enjoy this time while it lasts. Someday employers might realize they need to hire less and just cram more work into individual engineers schedules, because AI should supposedly make work much easier.

jajko 2 days ago||

Coding an actual solution is what, 5-10% of the overall project time?

I dont talk about some SV megacorps where better code can directly affect slightly revenue or valuation and thus more time is spend coding and debugging, I talk about basically all other businesses that somehow need developers.

Even if I would be 10x faster project managers would barely notice that. And I would lose a lot of creative fun that good coding tends to bring. Also debugging, 0 help there its all on you and your mind and experience.

Llms are so far banned in my banking megacorp and I aint complaining.

francisofascii 2 days ago|||

> Someday employers might realize they need to hire less and just cram more work into individual engineers schedules

We are already past that point. The high water mark for Devs was ironically in late 2020 during Covid, before RTO when we were in high demand.

jayd16 2 days ago|||

There's been pretty widespread layoffs in tech for a few years now.

teiferer 3 days ago||

[dead]