Looked like dog shit, but worked fine till it hit some edge cases.
Had to break the whole thing down again and pretty much start from scratch.
Ultimately not a bad day's work, and I still had it on for autocomplete on doc-strings and such, but like fuck will I be letting an agent near code I do for money again in the near future.
But I still more-or-less have to think like a software engineer. That's not going to go away. I have to make sure the code remains clean and well-organized -- which, for example, LLMs can help with, but I have to make precision requests and (most importantly) know specifically what I mean by "clean and well-organized." And I always read through and review any generated code and often tweak the output because at the end of the day I am responsible for the code base and I need to verify quality and I need to be able to answer questions and do all of the usual soft-skill engineering stuff. Etc. Etc.
So do whatever fits your need. I think LLMs are a massive multiplier because I can focus on the actual engineering stuff and automate away a bunch of the boring shit.
But when I read stuff like:
"I lost all my trust in LLMs, so I wouldn't give them a big feature again. I'll do very small things like refactoring or a very small-scoped feature."
I feel like I'm hearing something like, "I decided to build a house! So I hired some house builders and told them to build me a house with three bedrooms and two bathrooms and they wound up building something that was not at all what I wanted! Why didn't they know I really liked high ceilings?"
I hear this frequently from LLM aficionados. I have a couple of questions about it:
1) If there is so much boilerplate that it takes a significant amount of coding time, why haven't you invested in abstracting it away?
2) The time spent actually writing code is not typically the bottleneck in implementing a system. How much do you really save over the development lifecycle when you have to review the LLM output in any case?
Often times there's a lot of repetition in the app I'm working on, and there's a lot of it that's already been abstracted away, but we still have to import the component, its dependencies, and setup the whole thing which is indeed pretty boring. It really helps to tell the LLM to implement something and point it to an example of the style I want.
I feel like LLMs are already doing quite a lot. I spend less time rummaging through documentation or trying to remember obscure api's or other pieces of code in a software project. All I need is a strong mental model about the project and how things are done.
There is a lot of obvious heavy lifting that LLMs are doing that I for one am not able to take for granted.
For people facing constraints similar to those in a resource constrained economic environment, the benefits of any technology that helps them spend less time doing work that doesn't deliver value is immediately visible/obvious/apparent.
It is no longer an argument about whether it is a hype or something, it is more about how best to use it to achieve your goals. Forget the hype. Forget the marketing of AI companies - they have to do that to sell their products - nothing wrong with that. Don't let companies or bloggers set your own expectations of what could or should be done with this piece of tech. Just get on the bandwagon and experiment and find out what is too much. In the end I feel we will all come from these experiments knowing that LLMs are already doing quite a lot.
TRIVIA I even came by this article https://www.greptile.com/blog/ai-code-reviews-conflict. That clearly pointed out how LLM reliance can bring both the 10x dev and 1x dev closer to a median of "goodness". So the 10x dev is probably worse and the 1x dev ends up getting better - I'm probably that guy because I tend to mis subtle things in code and copilot review has had my ass for a while now - I haven't had defects like that in a while.
Personally the initial excitement has worn off for me and I am enjoying writing code myself and just using kagi assistant to ask the odd question, mostly research.
When a team mate who bangs on about how we should all be using ai tried to demo it and got things in a bit of a mess, I knew we had peaked.
And all that money invested into the hype!