Top
Best
New

Posted by untrimmed 10/27/2025

We reduced a container image from 800GB to 2GB(sealos.io)
87 points | 83 commentspage 2
solatic 1 day ago|
Reliable systems require hard limits imposed by designers. When systems hit the hard limits, it's a sign somebody's assumptions are wrong: either the designer built too small, or there's some bug pushing up against the hard limit. Either you catch the bug or make an intentional decision on how to scale further. This is basic engineering and is a requisite part of any undergraduate engineering degree worth its salt.

Allowing eight hundred gigabyte containers is gross incompetence. Trying to fix it by scaling the node disk from 2 TB to 2.5 TB is further evidence of incompetence. Understanding that you need to build a hard cap, but not concluding with action items to actually build one - instead just building monitoring for image size - is a clear sign to stay away.

It boggles my mind that the author could understand copy-on-write filesystem semantics but can't imagine how to engineer actual size limits on said filesystem. How is that possible?

.... oh right, the blogpost is LLM slop. So nobody knows what the author actually learned.

hsbauauvhabzb 2 days ago||
This seems very much like a ‘we mis configured our containers; then we realised, then we fixed it, then we blogged about it’ post of very little value.
pandemic_region 2 days ago|
Right? Blog posts like these makes me question competence instead of attributing it.
chuckadams 2 days ago||
Competence comes from experience built on lots of screw-ups. I like when people blog their mistakes.
hsbauauvhabzb 6 hours ago||
But if your computer doesn’t work and you inspect it and turns out you forgot to turn it on, does that really qualify for a blog post?

Log rotation and disk consuming logs are a tale as old as time

andai 2 days ago||
I'm not a sysadmin but doesn't the root cause sound like a missing fail2ban or something? (Sounds like a whole bunch of problems stacked on top of each other honestly.)
Uvix 2 days ago|
Yes, the article does list multiple root causes, including that one.
throwaway106382 1 day ago||
This is crazy. And they created an entire business around containers not even understand the basics of how building them work? Yikes.
KronisLV 2 days ago||
Images don't seem to be working:

https://sealos.io/_next/image?url=.%2Fimages%2Fcontainerd-hi...

https://sealos.io/_next/image?url=.%2Fimages%2Fbloated-conta...

Either way, hope the user was communicated with or alerted to what's going on.

At the same time, someone said that 800 GB container images are a problem in of themselves no matter the circumstances and they got downvoted for saying so - yet I mostly agree.

Most of mine are about 50-250 MB at most and even if you need big ones with software that's GB in size, you will still be happier if you treat them as something largely immutable. I've never had odd issues with them thanks to this. If you really care about data persistence, then you can use volumes/bind mounts or if you don’t then just throw things into tmpfs.

I'm not sure whether treating containers as something long lived with additional commmits/layers is a great idea, but if it works for other people, then good for them. Must be a pain to run something so foundational for your clients, though, cause you'll be exposed to most of the edge cases imaginable sooner or later.

reddozen 2 days ago||
Is it spooky that they said they looked inside a customer's image to fix this? A bunch of engineers just had access to their customer's intellectual property, security keys, git repos, ...
trenchpilgrim 2 days ago||
If you are adding security keys and git repos to your final shipped image you are doing things very wrong - a container image is literally a tarball and some metadata about how to run the executables inside. Even if you need that data to build your application you should use a multi-stage build to include only the final artifacts in the image you ship.

For stuff like security keys you should typically add them as build --args-- secrets, not as content in the image.

Ysx 2 days ago|||
> For stuff like security keys you should typically add them as build args, not as content in the image.

Build args are content in the image: https://docs.docker.com/reference/build-checks/secrets-used-...

hiddew 2 days ago||||
> For stuff like security keys you should typically add them as build args, not as content in the image.

Do not use build arguments for anything secret. The values are committed into the image layers.

never_inline 23 hours ago||
Yep. The only valid usecase I think of is using the secret for something else, eg connecting to an internal package registry, in which case the secret mounts may help.
tecleandor 2 days ago||||
Yeah, typically, but in this case they're commiting and commiting in the container image, and saving changes from running software. Not only that, they're commiting log files into the image, which is crazy.

The thing here is they're using Docker container images like if they were VM disks and they end up with images with almost 300 layers, like in this case. I think LXC or VMs should be a better case for this (but I don't know if they've tested it or why are they using Docker)

cowsandmilk 2 days ago|||
That’s nice, but you still shouldn’t be looking into your customer’s containers.
adastra22 2 days ago||
How else do they diagnose issues? Sorry to break it to you, this is absolutely standard across the entire industry.
stackskipton 1 day ago||
Evict the containers, let the customer know and get customer approval to work with their images.
adastra22 1 day ago|||
You have approval in the terms of service. This is absolutely known and expected across the entire industry. It's why your employees have clauses in their contracts about respecting third party confidentiality.
trenchpilgrim 1 day ago|||
What about this case where the container was working but was consuming overhead due to an infrastructure issue? Customer hasn't done anything wrong. If you stop their containers they'll likely leave for a competitor.
otterley 1 day ago||
I did a little research on this company. It’s related to (or wholly owned by) a Chinese entity called Labring. LinkedIn shows practically nobody related to the company other than its marketing team. Something smells incredibly fishy.
SJC_Hacker 2 days ago||
I did something on a smaller scale by ripping out large parts of Boost, which was nearly 50% of the image size
apexalpha 1 day ago||
Title makes it seem like 800GB images are a normal occurance: it is not.

2GB is the expected and default size for a docker image. It's a bit bloated even.

compootr 1 day ago||
this reads like it was written by a clanker
rekabis 1 day ago|
Defence № 2 and № 3 are ones I would do everywhere as a knee-jerk reaction, regardless of any justification to not bother with them. It’s just an ingrained habit at this point.

It’s № 1 which I could not have guessed at or gone for. Good write-up, love the transparency.

More comments...