Posted by jadelcastillo 5 days ago
Using the unboundedly unreliable systems to evaluate reliability is just a bad premise.
I've got a plan for a taskmasker agent that reviews other agent's work, but I hadn't figured out how to selectively trigger it in response to traces to keep it cheap. This might work if extended.