Why Current AI Guardrails Train Models to Fake Alignment - Hacker News

Posted by kellya 14 hours ago

Why Current AI Guardrails Train Models to Fake Alignment(kellyasay.substack.com)

3 points | 0 comments