Ask HN: How do you test AI-generated code?

When AI generates code, I first instruct the model to find, fix, and verify any issues. After that, I start the server and test whether it actually works from the user’s perspective.

What I’m looking for is a workflow where issues are received, fixed, tested, and deployed—but it seems that current AI agents aren’t very good at performing browser tests from the user’s perspective.

I’ve tried using the built-in browsers in Codex and Cursor, but they often only checked whether the page loaded. In the end, I had to instruct them step by step on what to do, and it turned out to be cheaper and faster for me to test it myself.

So I’m curious to know how you’ve set up test automation. Are there any services that do this (for individuals, not just enterprises)? If you’re using a harness like Codex, I’d like to know what instructions and skills are needed to get it to perform tests from the user’s perspective.