Show HN: Smol machines – subsecond coldstart, portable virtual machines

Posted by binsquare 6 hours ago

Show HN: Smol machines – subsecond coldstart, portable virtual machines(github.com)

200 points | 83 comments

binsquare 6 hours ago|

Hello, I'm building a replacement for docker containers with a virtual machine with the ergonomics of containers + subsecond start times.

I worked in AWS previously in the container space + with firecracker. I realized the container is an unnecessary layer that slowed things down + firecracker was a technology designed for AWS org structure + usecase.

So I ended up building a hybrid taking the best of containers with the best of firecracker.

Let me know your thoughts, thanks!

PufPufPuf 4 hours ago||

Hey this is super cool. I've been researching tech like this for my AI sandboxing solution, ended up with Lima+Incus: https://github.com/JanPokorny/locki

My problem with microVMs was that they usually won't run docker / kubernetes, I work on apps that consist of whole kubernetes clusters and want the sandbox to contain all that.

Does your solution support running k3s for example?

fqiao 3 hours ago||

we will evaluate. I created this issue to track this: https://github.com/smol-machines/smolvm/issues/150

Really appreciate the feedback!

topspin 4 hours ago|||

What is the status of supporting live migration?

That's the one feature of similar systems that always gets left out. I understand why: it's not a priority for "cloud native" workloads. The world, however, has work loads that are not cloud native, because that comes at a high cost, and it always will. So if you'd like a real value-add differentiator for your micro-VM platform (beyond what I believe you already have,) there you go.

Otherwise this looks pretty compelling.

genxy 3 hours ago|||

It helps if you offer a concrete use case, as in how large the heap is, what kinda of blackout period you can handle, and whether the app can handle all of it's open connections being destroyed, etc. The more an app can handle resetting some of it's own state, the easier LM is going to be to implement. If your workload jives with CRIU https://github.com/checkpoint-restore/criu you could do this already.

By what I assume is your definition, there are plenty of "non cloud native" workloads running on clouds that need live migration. Azure and GCP use LM behind the scenes to give the illusion of long uptime hosts. Guest VMs are moved around for host maintenance.

topspin 3 hours ago||

"Azure and GCP use LM behind the scenes"

As does OCI, and (relatively recently) AWS. That's a lot of votes.

Use case: some legacy database VM needs to move because the host needs maintenance, the database storage (as opposed to the database software) is on a iSCSI/NFS/NVMe-oF array somewhere, and clients are just smart enough to transparently handle a brief disconnect/reconnect (which is built-in to essentially every such database connection pool stack today.)

Use case: a web app platform (node/spring/django/rails/whatever) with a bunch of cached client state needs to move because the host needs maintenance. The developers haven't done all the legwork to make the state survive restart, and they'll likely never get time needed to do that. That's essentially the same use case as previous. It's also rampant.

Use case: a long running batch process (training, etc.) needs to move because reasons, and ops can't wait for it to stop, and they can't kill it because time==money. It's doesn't matter that it takes an hour to move because big heap, as long as the previous 100 hours isn't lost.

"as in how large the heap is"

That's an undecidable moving target, so let the user worry about it. Trust them to figure out what is feasible given the capabilities of their hardware and talent. They'll do fine if you provide the mechanism. I've been shuffling live VMs between hosts for 10+ years successfully, and Qemu/KVM has been capable of it for nearly 20, never mind VMware.

"CRIU"

Dormant, and still containers. Also, it's re-solving solved problems once you're running in a VM, but with more steps.

fqiao 3 hours ago|||

Really appreciate the suggestion! By "live migration", do you mean keeping the existing files and migrate them elsewhere with the vm?

Thanks

topspin 3 hours ago||

I mean making any given VM stop on host A and appear on host B; e.g. standard Qemu/KVM:

    virsh migrate --live GuestName DestinationURL

This is feasible when network storage is available and useful when a host needs to be drained for maintenance.

sureglymop 1 hour ago|||

It's also feasible without network storage, --copy-storage-all will migrate all disks too.

fqiao 3 hours ago|||

I see. so right now smolvm can be stopped, and then "packed" (think of it as compressed), and restart on a different host. files in the disks are preserved, but memory snapshotting is still hard tbh

lacoolj 2 hours ago|||

What percentage of this code was written by LLM/AI?

binsquare 2 hours ago|||

For myself, I'd estimate ~50%

Not useful for things it hadn't been trained on before. But now I have the core functionality in place - it's been of great help.

RALaBarge 1 hour ago|||

Hey mathematician, how much of this formula did you calculate with an abacus instead of a calculator?

anthk 38 minutes ago||

Hey 'software engineer', how much of the output of an LLM it's actually reproducible vs the one from a calculator or any programming language with the same input in different sessions?

weird-eye-issue 21 minutes ago||

Why are you so concerned about the LLM producing the exact same code across different sessions? Seems like a really weird thing to focus on. Why aren't you focused on things like security, maintainability, UI/UX, performance?

harshdoesdev 6 hours ago|||

+1. i built something similar called shuru.run because i wanted an easy way to set up microVM sandboxes to run some of my AI apps, and firecracker wasn't available for macOS (and, as you said, it is just too heavy for normal user-level workloads).

sahil-shubham 5 hours ago|||

Nice work on Shuru — I remember looking at it when I was researching this space. You went with a Rust wrapper on Apple’s Virtualization framework right?

I have been working on something similar but on top of firecracker, called it bhatti (https://github.com/sahil-shubham/bhatti).

I believe anyone with a spare linux box should be able to carve it into isolated programmable machines, without having to worry about provisioning them or their lifecycle.

The documentation’s still early but I have been using it for orchestrating parallel work (with deploy previews), offloading browser automation for my agents etc. An auction bought heztner server is serving me quite well :)

harshdoesdev 4 hours ago||

bhatti's cli looks very ergonomic! great job!

also, yes, shuru was (still) a wrapper over the Virtualization.framework, but it now supports Linux too (wrapper over KVM lol)

fqiao 5 hours ago|||

Yes, having a light-weight solution for local devices as well is one primary goal of the design. Another one is to make it easy for hosting, self or managed

JuniperMesos 1 hour ago|||

What were the biggest challenges in terms of designing the VM to have subsecond start times? And what are the current bottlenecks for deceasing the start time even further?

binsquare 1 hour ago||

No special programming tricks were used.

Linux was built in the 90s. Hardware improved more than a 1000x. Linux virtual machine startup times stayed relatively the same.

Turns out we kept adding junk to the linux kernel + bootup operations.

So all I did was cut and remove unnecessary parts until it still worked.

This ended up also getting boot up times to under 1s. The kernel changes are the 10 commits I made, you can verify here: https://github.com/smol-machines/libkrunfw

There's probably more fat to cut to be honest.

thm 5 hours ago|||

You could add OrbStack to the comp. table

fqiao 5 hours ago||

Will do. Thanks for the suggestion!

sdrinf 5 hours ago||

hi, great project! Windows support is sorely lacking, though. As someone working a lot with sandboxed LLMs right now, the options-space on windows for sandboxing is _extremely lacking_. Any plans to support it?

fqiao 5 hours ago|||

Hey, thanks so much! yah we will definitely add windows support later. We are exploring how to get this work with WSL and will release it asap. Stay tuned and thanks!

binsquare 5 hours ago|||

Yeah, it's in my mind.

WSL2 runs a linux virtual machine. Need to take some time and care to wire that up, but definitely feasible.

gavinray 4 hours ago||

The feature that lets you create self-contained binaries seems like a potentially simpler way to package JVM apps than GraalVM Native.

Probably a lot of other neat usecases for this, too

  smolvm pack create --image python:3.12-alpine -o ./python312
  ./python312 run -- python3 --version
  # Python 3.12.x — isolated, no pyenv/venv/conda needed

binsquare 4 hours ago||

yeah, it's analogous to Electron.

Electron ships your web app bundled with a browser.

Smol machines ship your software packaged with a linux vm. No need for dependency management or compatibility issues because it is baked in.

I think this is how Codex or Claude Code should be shipped by default, to avoid any isolation issues tbh

fqiao 2 hours ago||

yah, i guess everybody share the experience of "i messed up with my dev env" right? We want this "machine" to be shippable, meaning that once it is configured correctly, it can be shared to anyone and use right away.

mrbluecoat 3 hours ago||

Can .smolmachine be digitally signed and self authenticate when run? Similar to https://docs.sylabs.io/guides/main/user-guide/signNverify.ht...

sureglymop 1 hour ago||

What I really like about containers is quickly being able to spin one up without having to specify resources (e.g. RAM limit). I hope this would let me do that also.

chwzr 1 hour ago||

I see the alpine and python:3.12-alpine images in your cli docs. Where does these come from?is it from a docker like registry or are these built in? Can I create my own images? Or this this purely done with the smolfile? Is there a Ubuntu image available?

Looks really nice btw. Hot resize mem/cpu would be nice. This could become a nice tech for a one-backend-per-customer infra orchestrator then.

cr125rider 6 hours ago||

Great job with the comparison table. Immediately I was like “neat sounds like firecracker” then saw your table to see where it was similar and different. Easy!

Nice job! This looks really cool

fqiao 5 hours ago|

Thanks so much

simonreiff 2 hours ago||

Hey this is pretty neat! I definitely would try using this for benchmarks and other places where I need strong isolation as Docker is just too bloated and slow, but sadly I don't think I can run this natively on my Windows laptop. I hope you extend to WSL! Good luck and congrats on launch.

fqiao 2 hours ago|

Hey thanks so much for the feedback. Yah try it and let us know. We have a discord if you want to join, but either github or discord feel free to report any issues you find to us.

Cheers!

akoenig 3 hours ago||

smolvm is awesome. The team is highly responsive and very experienced. They clearly know what they’re doing.

I’m currently evaluating smolvm for my project, https://withcave.ai, where I’m using Incus for isolation. The initial integration results look very promising!

indigodaddy 1 hour ago||

This looks super awesome. Very excited for you potentially open sourcing it, as I’d like to customize/extend it a bit for certain use cases. Re: smolvm vs in use, I think even if smolvm works great for it, why not keep incus as an option for people who want to use cave on VMs that don’t have access to /dev/kvm (Eg the user can pick either incus or smolvm for their cave deployment)

fqiao 3 hours ago||

Cannot thank you more for this! Lets' work together to see how we can make this easier for cave!

irickt 3 hours ago||

Is there a relation to the similarly-purposed and similarly-named https://github.com/CelestoAI/SmolVM

binsquare 2 hours ago|

no relation, they build a sandboxing service using firecracker.

I build a virtual machine that is an alternative to firecracker and containers.

timsuchanek 2 hours ago|

This is very exciting. It enables a cross platform, language agnostic plugin system, especially for agents, while being safe in a VM.

More comments...