Posted by cuchoi 1 day ago
I don’t even know 2k people
(why is your assistant discoverable online?)
It's literally called 'HackMyClaw'
To be fair I appreciate the effort of running and sharing the test. It will hopefully lead to better ones. But this is not a great test. Super interesting to think about what would constitute a better test.
For one, I think the agent would have to be expected to have productive interaction through the email channel, in a way the user depends on it generally working for some real world use case / value prop. In other words, needing emails to actually have the agent really do work, respond with results, etc. Also, most requests should be legit and the real attacks should be intelligently disguised, not pitiful/joke-level spam (although those would be arguably realistic to have in the stream, but, perhaps only as deflection so that the real attack is mischaracterized.)
But still a good thing overall. Two years ago this was not the case, and you could ask it to break its system prompt with a poem and get all the secrets back...