Claude Code can debug low-level cryptography

Posted by Bogdanp 4 days ago

Claude Code can debug low-level cryptography(words.filippo.io)

465 points | 208 commentspage 2

gdevenyi 3 days ago|

Coming soon, adversarial attacks on LLM training to ensure cryptographic mistakes.

nikanj 3 days ago||

I'm surprised it didn't fix it by removing the code. In my experience, if you give Claude a failing test, it fixes it by hard-coding the code to return the value expected by the test or something similar.

Last week I asked it to look at why a certain device enumeration caused a sigsegv, and it quickly solved the issue by completely removing the enumeration. No functionality, no bugs!

pessimizer 3 days ago|

I've got a paste in prompt that reiterates multiple times not to remove features or debugging output without asking first, and not to blame the test file/data that the program failed on. Repeated multiple times, the last time in all caps. It still does it. I hope maybe half as often, but I may be fooling myself.

phendrenad2 3 days ago||

A whole class of tedious problems have been eliminated by LLMs because they are able to look at code in a "fuzzy" way. But this can be a liability, too. I have a codebase that "looks kinda" like a nodejs project, so AI agents usually assume it is one, even if I rename the package.json, it will inspect the contents and immediately clock it as "node-like".

didibus 3 days ago||

This is basically the ideal scenario for coding agents. Easily verifiable through running tests, pure logic, algorithmic problem. It's the case that has worked the best for me with LLMs.

cluckindan 3 days ago||

So the ”fix” includes a completely new function? In a cryptography implementation?

I feel like the article is giving out very bad advice which is going to end up shooting someone in the foot.

thadt 3 days ago||

Can you expand on what you find to be 'bad advice'?

The author uses an LLM to find bugs and then throw away the fix and instead write the code he would have written anyway. This seems like a rather conservative application of LLMs. Using the 'shooting someone in the foot' analogy - this article is an illustration of professional and responsible firearm handling.

sciencejerk 3 days ago|||

Layman in cryptotography (that's 99% of us at least) may be encouraged to deploy LLM generated crypto implementations, without understanding the crypto

9dev 3 days ago||

If they consider doing that, they will without LLMs or with them. Raise your juniors right.

lisbbb 3 days ago|||

Honestly, it read more like attention seeking. He "live coded" his work, by which I believe he means he streamed everything he was doing while working. It just seems so much more like a performance and building a brand than anything else. I guess that's why I'm just a nobody.

OneDeuxTriSeiGo 3 days ago||

The article even states that the solution claude proposed wasn't the point. The point was finding the bug.

AI are very capable heuristics tools. Being able to "sniff test" things blind is their specialty.

i.e. Treat them like an extremely capable gas detector that can tell you there is a leak and where in the plumbing it is, not a plumber who can fix the leak for you.

rizky05 3 days ago||

[dead]

lordnacho 3 days ago||

I'm not surprised it worked.

Before I used Claude, I would be surprised.

I think it works because Claude takes some standard coding issues and systematizes them. The list is long, but Claude doesn't run out of patience like a human being does. Or at least it has some credulity left after trying a few initial failed hypotheses. This being a cryptography problem helps a little bit, in that there are very specific keywords that might hint at a solution, but from my skim of the article, it seems like it was mostly a good old coding error, taking the high bits twice.

The standard issues are just a vague laundry list:

- Are you using the data you think you're using? (Bingo for this one)

- Could it be an overflow?

- Are the types right?

- Are you calling the function you think you're calling? Check internal, then external dependencies

- Is there some parameter you didn't consider?

And a bunch of others. When I ask Claude for a debug, it's always something that makes sense as a checklist item, but I'm often impressed by how it diligently followed the path set by the results of the investigation. It's a great donkey, really takes the drudgery out of my work, even if it sometimes takes just as long.

vidarh 3 days ago||

> The list is long, but Claude doesn't run out of patience like a human being does

I've flat out had Claude tell me it's task was getting tedious, and it will often grasp at straws to use as excuses for stopping a repetitive task and moving in to something else.

Keeping it on task when something keeps moving forward, is easy, but when it gets repetitive it takes a lot of effort to make it stick to it.

8note 3 days ago||

ive been getting that experience with both claude-code and gemini, but not from cline and qcli. i wonder why

danielbln 3 days ago||

Different system prompts, also Claude Code is aware of its own context and will sometimes try to simplify or take a short cut as the context nears exhaustion.

Some global rules will generally keep it on track though, telling it to ask me before it simplifies or give up, and I ask it frequently to ask me clarifying questions, which generally also helps keeping it chugging in the right direction and uncover gaps in its understanding.

ay 3 days ago||

> Claude doesn't run out of patience like a human being does.

It very much does! I had a debugging session with Claude Code today, and it was about to give up with the message along the lines of “I am sorry I was not able to help you find the problem”.

It took some gentle cheering (pretty easy, just saying “you are doing an excellent job, don’t give up!”) and encouragement, and a couple of suggestions from me on how to approach the debug process for it to continue and finally “we” (I am using plural here because some information that Claude “volunteered” was essential to my understanding of the problem) were able to figure out the root cause and the fix.

lordnacho 3 days ago|||

That's interesting, that only happened to me on GPT models in Cursor. It would apologize profusely.

ericphanson 3 days ago|||

Claude told me it stopped debugging since it would run out of tokens in its context window. I asked how many tokens it had left and it said actually it had plenty so could continue. Then again it stopped, and without me asking about tokens, wrote

Context Usage • Used: 112K/200K tokens (56%) • Remaining: 88K tokens • Sufficient for continued debugging, but fresh session recommended for clarity

lol. I said ok use a subagent for clarity.

zcw100 3 days ago||

I just recently found a number of bugs in both the RELIC and MCL libraries. It took a while to track them down but it was remarkable that it was able to find them at all.

jerf 2 days ago||

This is one of the things I've mentioned before, I think it's just hidden a bit and hard to see, but this is basically the LLM doing style transfer, which they're really good at. There was a specification for the code (which looks like it was already trained into the LLM since it didn't have to go fetch it but it also had intimate knowledge of), there was an implementation, and it's really good at extracting out the style difference between code and spec. Anything that looks like style transfer is a good use for LLMs.

As another example, I think things like "write unit tests for this code" are usually similar sort of style transfer as well, based on how it writes the tests. It definitely has a good idea as to how to sort of ensure that all the functionality gets tested, I find it is less likely to produce "creative" ways that bugs may come out, but hey, it's a good start.

This isn't a criticism, it's intended to be a further exploration and understanding of when these tools can be better than you might intuitively think.

deadbabe 3 days ago|

With AI, we will finally be able to do the impossible: roll our own crypto.

oytis 3 days ago||

It is very much possible, it's just a bad idea. Doubly so with AI.

lisbbb 3 days ago|||

You're not going to do better than the NSA.

marginalia_nu 3 days ago|||

To be fair you're also not going to be backdoored by the NSA.

deadbabe 3 days ago|||

I don’t have to.

LLMs built by trillion dollar companies will do it for me.

tptacek 3 days ago||

That's exactly not what he's doing.