Top
Best
New

Posted by untrimmed 10/27/2025

We reduced a container image from 800GB to 2GB(sealos.io)
90 points | 84 commentspage 2
untrimmed 10/27/2025|
Our platform is designed to solve a very specific workflow, and the DevBox is only the first step in that process.

Our users need to connect their local VS Code, Cursor, or JetBrains IDEs to the cloud environment. The industry-standard extensions for this only speak the SSH protocol. So, to give our users the tools they love, the container must run an SSHD to act as the host.

We aren't just a CDE like Coder or Codespaces. We're trying to provide a fully integrated, end-to-end application lifecycle in one place.

The idea is that a developer on Sealos can:

1. Spin up their DevBox instantly. 2. Code and test their feature in that environment (using their local IDE). 3. Then, from that same platform, package their application into a production-ready, versioned image. 4. And finally, deploy that image directly to a production Kubernetes environment with one click.

That "release" feature was how we let a developer "snapshot" their entire working environment into a deployable image without ever having to write a Dockerfile.

topaz0 11/2/2025||
Agree with other commenters that this seems like a bad idea. Why on earth should the release image contain all of the cruft of development?? Why on earth should it contain historical versions of all that cruft??
macNchz 11/2/2025||
This does seem bonkers to me. All but guaranteed to have worse issues than bloated container images in the future.
chuckadams 11/2/2025|||
The irony is that Kubernetes already provides a "ssh into any container" ability, and it's provided directly by k8s, no sshd needed (it's not the ssh protocol but it's good enough to get a shell). Not sure it's advisable to do with any user but an admin, but the standard workflow with k8s is not to shell into running containers anyway, it's to rebuild the container and redeploy the pod.
otterley 11/2/2025|||
Who are you, exactly? There is practically no publicly available information about your company, other than that it appears to be held by a Chinese entity called Labring.
thaumasiotes 11/2/2025||
What's up with the images that are supposed to appear in the article? They appear to be coded to load from "./images/containerd-high-disk-io-iotop.png", but https://sealos.io/blog/images/containerd-high-disk-io-iotop.... and https://sealos.io/images/containerd-high-disk-io-iotop.png both fail.

(And indeed, the images are broken in Firefox and Edge. Is there another browser where they're not broken?)

andai 11/2/2025||
I'm not a sysadmin but doesn't the root cause sound like a missing fail2ban or something? (Sounds like a whole bunch of problems stacked on top of each other honestly.)
Uvix 11/2/2025|
Yes, the article does list multiple root causes, including that one.
solatic 11/2/2025||
Reliable systems require hard limits imposed by designers. When systems hit the hard limits, it's a sign somebody's assumptions are wrong: either the designer built too small, or there's some bug pushing up against the hard limit. Either you catch the bug or make an intentional decision on how to scale further. This is basic engineering and is a requisite part of any undergraduate engineering degree worth its salt.

Allowing eight hundred gigabyte containers is gross incompetence. Trying to fix it by scaling the node disk from 2 TB to 2.5 TB is further evidence of incompetence. Understanding that you need to build a hard cap, but not concluding with action items to actually build one - instead just building monitoring for image size - is a clear sign to stay away.

It boggles my mind that the author could understand copy-on-write filesystem semantics but can't imagine how to engineer actual size limits on said filesystem. How is that possible?

.... oh right, the blogpost is LLM slop. So nobody knows what the author actually learned.

throwaway106382 11/2/2025||
This is crazy. And they created an entire business around containers not even understand the basics of how building them work? Yikes.
rekabis 11/2/2025||
Defence № 2 and № 3 are ones I would do everywhere as a knee-jerk reaction, regardless of any justification to not bother with them. It’s just an ingrained habit at this point.

It’s № 1 which I could not have guessed at or gone for. Good write-up, love the transparency.

KronisLV 11/2/2025||
Images don't seem to be working:

https://sealos.io/_next/image?url=.%2Fimages%2Fcontainerd-hi...

https://sealos.io/_next/image?url=.%2Fimages%2Fbloated-conta...

Either way, hope the user was communicated with or alerted to what's going on.

At the same time, someone said that 800 GB container images are a problem in of themselves no matter the circumstances and they got downvoted for saying so - yet I mostly agree.

Most of mine are about 50-250 MB at most and even if you need big ones with software that's GB in size, you will still be happier if you treat them as something largely immutable. I've never had odd issues with them thanks to this. If you really care about data persistence, then you can use volumes/bind mounts or if you don’t then just throw things into tmpfs.

I'm not sure whether treating containers as something long lived with additional commmits/layers is a great idea, but if it works for other people, then good for them. Must be a pain to run something so foundational for your clients, though, cause you'll be exposed to most of the edge cases imaginable sooner or later.

reddozen 11/2/2025||
Is it spooky that they said they looked inside a customer's image to fix this? A bunch of engineers just had access to their customer's intellectual property, security keys, git repos, ...
trenchpilgrim 11/2/2025||
If you are adding security keys and git repos to your final shipped image you are doing things very wrong - a container image is literally a tarball and some metadata about how to run the executables inside. Even if you need that data to build your application you should use a multi-stage build to include only the final artifacts in the image you ship.

For stuff like security keys you should typically add them as build --args-- secrets, not as content in the image.

Ysx 11/2/2025|||
> For stuff like security keys you should typically add them as build args, not as content in the image.

Build args are content in the image: https://docs.docker.com/reference/build-checks/secrets-used-...

hiddew 11/2/2025||||
> For stuff like security keys you should typically add them as build args, not as content in the image.

Do not use build arguments for anything secret. The values are committed into the image layers.

never_inline 11/3/2025||
Yep. The only valid usecase I think of is using the secret for something else, eg connecting to an internal package registry, in which case the secret mounts may help.
tecleandor 11/2/2025||||
Yeah, typically, but in this case they're commiting and commiting in the container image, and saving changes from running software. Not only that, they're commiting log files into the image, which is crazy.

The thing here is they're using Docker container images like if they were VM disks and they end up with images with almost 300 layers, like in this case. I think LXC or VMs should be a better case for this (but I don't know if they've tested it or why are they using Docker)

cowsandmilk 11/2/2025|||
That’s nice, but you still shouldn’t be looking into your customer’s containers.
adastra22 11/2/2025||
How else do they diagnose issues? Sorry to break it to you, this is absolutely standard across the entire industry.
stackskipton 11/2/2025||
Evict the containers, let the customer know and get customer approval to work with their images.
trenchpilgrim 11/2/2025|||
What about this case where the container was working but was consuming overhead due to an infrastructure issue? Customer hasn't done anything wrong. If you stop their containers they'll likely leave for a competitor.
adastra22 11/2/2025|||
You have approval in the terms of service. This is absolutely known and expected across the entire industry. It's why your employees have clauses in their contracts about respecting third party confidentiality.
otterley 11/2/2025||
I did a little research on this company. It’s related to (or wholly owned by) a Chinese entity called Labring. LinkedIn shows practically nobody related to the company other than its marketing team. Something smells incredibly fishy.
SJC_Hacker 11/2/2025||
I did something on a smaller scale by ripping out large parts of Boost, which was nearly 50% of the image size
BoredPositron 11/2/2025|
If your image is 800GB you are doing something wrong in the first place.
adastra22 11/2/2025|
You didn't read the article.
BoredPositron 11/2/2025||
I did and the image had problems to begin with. If it's a bad image or a bad configuration of your visor or in the image doesn't matter. If your images can bloat to over 800GB you are doing the basics wrong. Hint: Using commit to create your images...
More comments...