Posted by vantareed 12 hours ago
It used to work okay, but a while back they landed a major regression for an entire team of folks I work with.
No response, no workaround.
You can install pi, then install pi-sandbox locked to the current version. Here it is described how pi-sandbox plus an additional extension allow you to have the experience where a sandbox is used, but you can fall back to unsandboxed with approval required. https://github.com/carderne/pi-sandbox/issues/50
My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).
But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.
Come on, this is such an easy thing to forget to test. Don't act like there is some magical testing strategy that would have caught this
Integration testing could/should catch this, especially for a client side app.
A simple constraints is a good thing. "Our app shouldn't use more than 50mb of ram, or use 3gb of disk space."
It's fascinating how offensive some of this verbiage becomes to you when you see it attached to LLM output too much.