Launch HN: Freestyle: Sandboxes for AI Coding Agents

Posted by benswerd 4 hours ago

Launch HN: Freestyle: Sandboxes for AI Coding Agents(www.freestyle.sh)

We’re Ben and Jacob, cofounders of Freestyle (https://freestyle.sh). We’re building a cloud for Coding Agents.

For the first generation of agents it looked like workflows with minimal tools. 2 years ago we published a package to let AI work in SQL, at that time GPT-4 could write simple scripts. Soon after the first AI App Builders started using AI to make whole websites; we supported that with a serverless deploy system.

But the current generation is going much further, instead of minimal tools and basic serverless apps AI can utilize the full power of a computer (“sandbox”). We’re building sandboxes that are interchangeable with EC2s from your agents perspective, with bonus features:

1. We’ve figured out how to fork a sandbox horizontally without more than a 400ms pause in it. That's not forking the filesystem, we mean forking the whole memory of it. If you’re half way down a browser page with animations running, they’ll be in the same place in all the forks. If you’re running a minecraft server every block and player will be in the same place on the forks. If you’re running a local environment and an error comes up in process that error will be there in all the forks. This works for snapshotting as well, you can save your place and come back weeks later.

2. Our sandboxes start in ~500ms.

Demo: https://www.loom.com/share/8b3d294d515442f296aecde1f42f5524

Compared with other sandboxes, our goal is to be the most powerful. We support full Linux + hardware-virtualization, eBPF, Fuse, etc. We run full Debian with multiple users and we use a systemd init instead of runc. Whatever your AI expects to work on debian should work on these vms, and if it doesn’t send a bug report.

In order to make this possible, we’ve moved to our own bare metal racks. Early in our testing we realized that moving VMs across cloud nodes would not have acceptable performance properties. We asked Google Cloud and AWS for a quote on their bare metal nodes and found that the monthly cost was equivalent to the total cost of the hardware so we did that.

Our goal is to build the necessary infrastructure to replicate the human devloop on the massively multi-tenant scale of AI, so these VMs should be as powerful as the ones you’re used to, while also being available to provision in seconds.

106 points | 58 commentspage 2

jnstrdm05 2 hours ago|

how many seconds to provision are we talking about here? 1 sec vs 60 is a dealbreaker for me, some clarity on that would be nice.

benswerd 2 hours ago|

500ms. Less than 1 second. We're aiming to get that down to 200ms in the next 3 months.

maxmaio 2 hours ago||

Congrats Ben and Jacob!

Fraaaank 3 hours ago||

Your pricing page is broken

benswerd 3 hours ago|

Reviewing this now. our public pricing at www.freestyle.sh/pricing seems to be working, can you point me in a more specific direction?

rasengan 3 hours ago||

Interesting!

We're working on a similar solution at UnixShells.com [1]. We built a VMM that forks, and boots, in < 20ms and is live, serving customers! We have a lot of great tools available, via MIT, on our github repo [2] as well!

[1] https://unixshells.com

[2] https://github.com/unixshells

tomComb 2 hours ago|

Can your service scale ram? like the way docker desktop does. Manual is fine.

benswerd 1 hour ago||

yep you can choose ram + disk + cpu size

tomComb 16 minutes ago||

? You say 'yes' but you seem to be answering a different question. Docker desktop only makes me choose a max ram - it dynamically scales RAM usage. I don't need fully automatic like that, but the ability to vertically scale RAM for an existing instance is really important, particularly given the cost of RAM these days.

dominotw 2 hours ago||

dumb question. none of these protect your from prompt injection. yes?

benswerd 2 hours ago|

no, but the goal of these is if you are faced with prompt injection the worst case scenario is the AI uses that computer badly.

dominotw 2 hours ago||

unless i am misundestanding. not sure how this computer prevents secrets from my gmail leaking. thats the worst case.

benswerd 1 hour ago||

If you put your gmail credentials into a VM that an AI Agent dealing with untrusted prompts has access to they should be treated as leaked and be disabled immediately.

However, if you don't put your administrative credentials inside of the VM and treat it as an unsafe environment you can safely give it minimal permissions to access specific things that it needs and using that access it can perform complex tasks.

dominotw 1 hour ago||

i am talking about this . not my gmail credentials.

https://simonwillison.net/2024/Mar/5/prompt-injection-jailbr...

siva7 3 hours ago||

I have so many interesting problems on Ai, sandboxing isn't one of them. It's a pointless excercise yet disproportionately so many people love to to do this. Probably because sandboxing doesn't feel as magic as Agents itself and more like the old times of "traditional" software development.

hobofan 3 hours ago||

It is a mostly pointless exercise if the goal is trying to contain negative impact of AI agents (e.g. OpenClaw).

It is a very necessary building block for many common features that can be steered in a more deterministic way, e.g. "code interpreter" feature for data analysis or file creation like commonly seen in chat web UIs.

moezd 2 hours ago|||

Believe it or not, once you start working for a regulated industry, it is all you would ever think of. There, people don't care if you are vibing with the latest libraries and harnesses or if it's magic, they care that the entire deployment is in some equivalent of a Faraday cage. Plus, many people just don't appreciate it when their agents go rm -rf / on them.

iterateoften 3 hours ago||

Yeah, idk I guess it’s interesting if you are an engineer looking for something to do,

But like I see multiple sandbox for agents products a week. Way too saturated of a market

benswerd 3 hours ago||

I disagree (as a sandboxing company).

With respect to the market, every single sandbox sucks. I'm not gonna shit talk competitors but there is not a good sandboxing platform out there yet — including me — compared to where we'll be in 6 months.

We've heard all the platforms have consistent uptime, feature completeness, networking and debugging issues. And in our own platform we're not 1/10ths of the way through solving the requests we've gotten.

Next generation of Agents needs computers, and those computers are gonna look really different than "sandboxes" do today.

tcdent 2 hours ago||

I don't think you're wrong, but if you really want to really re-think the approach, building an orchestration layer for Firecracker like every other company in the space is doing is probably not it.

borakostem 3 hours ago||

[flagged]

benswerd 3 hours ago|

So this is an ongoing optimization point, no perfect solution exists. Freestyle VMs work with a network namespace and virtual ethernet cable going into them, so they all think they are the same IP.

This means that while complex protocol connections like remote Postgres can break in the forks, stuff like Websockets just automatically reconnects.

aplomb1026 3 hours ago||

[dead]

n1tro_lab 2 hours ago||

[dead]

johnwhitman 3 hours ago|

[dead]