Posted by Bogdanp 4 days ago
Last week I asked it to look at why a certain device enumeration caused a sigsegv, and it quickly solved the issue by completely removing the enumeration. No functionality, no bugs!
I feel like the article is giving out very bad advice which is going to end up shooting someone in the foot.
The author uses an LLM to find bugs and then throw away the fix and instead write the code he would have written anyway. This seems like a rather conservative application of LLMs. Using the 'shooting someone in the foot' analogy - this article is an illustration of professional and responsible firearm handling.
AI are very capable heuristics tools. Being able to "sniff test" things blind is their specialty.
i.e. Treat them like an extremely capable gas detector that can tell you there is a leak and where in the plumbing it is, not a plumber who can fix the leak for you.
Before I used Claude, I would be surprised.
I think it works because Claude takes some standard coding issues and systematizes them. The list is long, but Claude doesn't run out of patience like a human being does. Or at least it has some credulity left after trying a few initial failed hypotheses. This being a cryptography problem helps a little bit, in that there are very specific keywords that might hint at a solution, but from my skim of the article, it seems like it was mostly a good old coding error, taking the high bits twice.
The standard issues are just a vague laundry list:
- Are you using the data you think you're using? (Bingo for this one)
- Could it be an overflow?
- Are the types right?
- Are you calling the function you think you're calling? Check internal, then external dependencies
- Is there some parameter you didn't consider?
And a bunch of others. When I ask Claude for a debug, it's always something that makes sense as a checklist item, but I'm often impressed by how it diligently followed the path set by the results of the investigation. It's a great donkey, really takes the drudgery out of my work, even if it sometimes takes just as long.
I've flat out had Claude tell me it's task was getting tedious, and it will often grasp at straws to use as excuses for stopping a repetitive task and moving in to something else.
Keeping it on task when something keeps moving forward, is easy, but when it gets repetitive it takes a lot of effort to make it stick to it.
Some global rules will generally keep it on track though, telling it to ask me before it simplifies or give up, and I ask it frequently to ask me clarifying questions, which generally also helps keeping it chugging in the right direction and uncover gaps in its understanding.
It very much does! I had a debugging session with Claude Code today, and it was about to give up with the message along the lines of “I am sorry I was not able to help you find the problem”.
It took some gentle cheering (pretty easy, just saying “you are doing an excellent job, don’t give up!”) and encouragement, and a couple of suggestions from me on how to approach the debug process for it to continue and finally “we” (I am using plural here because some information that Claude “volunteered” was essential to my understanding of the problem) were able to figure out the root cause and the fix.
Context Usage • Used: 112K/200K tokens (56%) • Remaining: 88K tokens • Sufficient for continued debugging, but fresh session recommended for clarity
lol. I said ok use a subagent for clarity.
As another example, I think things like "write unit tests for this code" are usually similar sort of style transfer as well, based on how it writes the tests. It definitely has a good idea as to how to sort of ensure that all the functionality gets tested, I find it is less likely to produce "creative" ways that bugs may come out, but hey, it's a good start.
This isn't a criticism, it's intended to be a further exploration and understanding of when these tools can be better than you might intuitively think.
LLMs built by trillion dollar companies will do it for me.