It did get me thinking the extent to which I could bypass the original prompt and use someone else's tokens for free.
I don't know enough about LLM training or architecture to know if this is actually possible, though. Anyone care to comment?
There are a lot of services out there that offer these types of AI guardrails, and it doesn’t have to be expensive.
Not saying that this approach is foolproof, but it’s better than relying solely on better prompting or human review.
Edit: Also part of what makes it funny how succinct and sudden it is. I think actually it would still be funny with "ignore" instead of "disregard", but it would be lessened a bit.
EDIT: https://web.archive.org/web/20080702204110/http://bash.org/?...
I think the question is, how much risk is involved and how much do those mitigating methods reduce it? And with that, we can figure out what applications it is appropriate for.