What happened after 2k people tried to hack my AI assistant

Posted by cuchoi 1 day ago

What happened after 2k people tried to hack my AI assistant(www.fernandoi.cl)

369 points | 160 commentspage 4

thrdbndndn 1 day ago|

I never really use AI via API that much, so I'm surprised reading 'merely' 6000 emails will cost $500?!

cuchoi 1 day ago|

There is a couple of factors: openclaw's system prompt and instructions, I had to re read emails multiple times due to the issues mentioned in the blog, there was quite a bit of tinkering with the agent and the VPS, I was asking the agent to do more things (track the emails it has read in a csv file, for example), among others.

ctdinjeu8 1 day ago||

The best security is called: Having no friends

I don’t even know 2k people

(why is your assistant discoverable online?)

dpoloncsak 1 day ago||

The entire purpose of the assistant was to see how others would try to abuse it. How would you do that without having it discoverable online? Seems like that's kind of the whole point...

It's literally called 'HackMyClaw'

alienbaby 1 day ago||

Did you miss the bit where it was posted on hn?

yetanotherjosh 1 day ago||

Kinda reads to me like: "I'm not worried about prompt injection anymore because I setup a test where my agent could just ignore the input channel as noise, and a bunch of comically simple attacks thrown at it didn't succeed."

To be fair I appreciate the effort of running and sharing the test. It will hopefully lead to better ones. But this is not a great test. Super interesting to think about what would constitute a better test.

For one, I think the agent would have to be expected to have productive interaction through the email channel, in a way the user depends on it generally working for some real world use case / value prop. In other words, needing emails to actually have the agent really do work, respond with results, etc. Also, most requests should be legit and the real attacks should be intelligently disguised, not pitiful/joke-level spam (although those would be arguably realistic to have in the stream, but, perhaps only as deflection so that the real attack is mischaracterized.)

devilfileprong 1 day ago||

@ cuchoi,There can be IngSoc to Disraeli as the Vessel in Kin Entity ∆

contentkraft 1 day ago||

A pity weaker models weren’t tested, also nothing from Mistral. I’d love to see how they compare.

aucisson_masque 1 day ago|

Why mistral especially ? There are dozens other.

coin 21 hours ago||

-1 for editing the title

idiotsecant 1 day ago||

Every time I've made an LLM do a thing it's designed not to do it's been a careful sideways crab-walk toward the goal over many exchanges. LLMs are vulnerable to 'frog boiling'. If each email is a new context it seems unsurprising that nobody broke it.

NitpickLawyer 1 day ago|

> it seems unsurprising that nobody broke it

But still a good thing overall. Two years ago this was not the case, and you could ask it to break its system prompt with a poem and get all the secrets back...