Posted by jakelsaunders94 6 days ago
I disrecommend UFW.
firewalld is a much better pick in current year and will not grow unmaintainable the way UFW rules can.
firewall-cmd --persistent --set-default-zone=block
firewall-cmd --persistent --zone=block --add-service=ssh
firewall-cmd --persistent --zone=block --add-service=https
firewall-cmd --persistent --zone=block --add-port=80/tcp
firewall-cmd --reload
Configuration is backed by xml files in /etc/firewalld and /usr/lib/firewalld instead of the brittle pile of sticks that is the ufw rules files. Use the nftables backend unless you have your own reasons for needing legacy iptables.Specifically for docker it is a very common gotcha that the container runtime can and will bypass firewall rules and open ports anyway. Depending on your configuration, those firewall rules in OP may not actually do anything to prevent docker from opening incoming ports.
Newer versions of firewalld gives an easy way to configure this via StrictForwardPorts=yes in /etc/firewalld/firewalld.conf.
In my own use I have 10.0.10.11 on the vm that I host docker stuff. It doesn't even have its own public IP meaning I could actually expose to 0.0.0.0 if I wanted to but things might change in the future so it's a precaution. That IP is only accessible via wireguard and by the other machines that share the same subnet so reverse proxying with caddy on a public IP is super easy.
Docker also has more traps and not quite as obvious as this. For example, it can change the private IP block its using without telling you. I got hit by this once due to a clash with a private block I was using for some other purpose. There's a way to fix it in the config but it won't affect already created containers.
By the way. While we're here. A public service announcement. You probably do NOT need the userland-proxy and can disable it.
/etc/docker/daemon.json
{ "userland-proxy": false }
Some quick searching yields generic advice about keeping everything updated or running in rootless mode.
{
"log-driver": "json-file",
"log-opts": {
"labels": "production_status",
"tag": "{{.ImageName}}|{{.Name}}|{{.ImageFullID}}|{{.FullID}}",
"env": "os,customer",
"max-size": "10m"
},
"bip": "172.17.1.1/24",
"default-address-pools": [
{"base": "172.17.0.0/16", "size": 24}
]
}So you can create multiple addresses with multiple separate "domains" mapped statically in /etc/hosts, and allow multiple apps to listen on "the same" port without conflicts.
..actually this is very weird. Are you saying you can bind to 127.0.0.2:80 without adding a virtual IP to the NIC? So the concept of "localhost" is really an entire class A network? That sounds like a network stack bug to me heh.
edit: yeah my route table on osx confirms it. very strange (at least to me)
You can do:
python3 -m http.server -b 127.0.0.1 8080
python3 -m http.server -b 127.0.0.2 8080
python3 -m http.server -b 127.0.0.3 8080
and all will be available.
Private network ranges don't really have the same purpose, they can be routed, you have to always consider conflicts and so on. But here with 127/8 you are in your own world and you don't worry about anything. You can also do tests where you need to expose more than 65k ports :)
You have to also remember these are things established likely before even DNS was a thing, IP space was considered so big that anyone could have a huge chunk of it, and it was mostly managed manually.
> Specifically for docker it is a very common gotcha that the container runtime can and will bypass firewall rules and open ports anyway.
Like I said in another comment, drop Docker, install podman.Also the Docker Compose tool is a well-know exception to the compatibility story. (There is some unofficial podman compose tool, but that is not feature complete and quadlets are better anyway.)
I agree with approaching podman as its own thing though. Yes, you can build a Dockerfile, but buildah lets you build an efficient OCI image from scratch without needing root. For those interested, this document¹ explains how buildah compares to podman and docker.
1. https://github.com/containers/buildah/tree/main/docs/contain...
The docs you need for quadlets are basically here: https://docs.podman.io/en/latest/markdown/podman-systemd.uni...
The one gotcha I can think of not mentioned there is that if you run it as a non-root user and want it to run without logging in as that user, you need to: `sudo loginctl enable-linger $USER`.
If you don't vibe with quadlets, it's equally fine to do a normal systemd .service file with `ExecStart=podman run ...`, which quadlets are just convenience sugar for. I'd start there and then return to quadlets if/when you find that becomes too messy. Don't add new abstraction layers just because you can if they don't help.
If you have a more complex service consisting of multiple containers you want to schedule as a single unit, it's also totally fine to combine systemd and compose by having `ExecStart=podman compose up ...`.
Do you want it to run silently in the background with control over autorestarts and log to system journal? Quadlets/systemd.
Do you want to have multiple containers scheduled together (or just prefer it)? Compose.
Do you want to manually invoke it and have the output in a terminal by default? CLI run or compose.
Of course, when you think about it, nobody expects a command to just survive logging out, but coming from docker, you still have that expectation. And I wonder, am I supposed to be running this on a tmux like the old days? No, you need to do a bunch of systemd/linger/stuff. So being that we are already in systemd land, you keep searching and end up in quadlets, which are a new-ish thing with (last I checked) bad docs, replacing whatever was used before (which has good docs). Docs, being said, that give k8s ptsd. Quadlet, podlet and pods.
It seems that when podman deviates from docker, it does in the least ergonomic way possible. Or maybe I have been lobotomized by years and years of using docker, or maybe my patience threshold is very low nowadays. But this has been my experience. I felt very stupid when I deployed something and it stopped after 5 minutes. I was ready to use podman, because it worked locally. And then it failed in production. Thanks no.
This is not a side effect of running rootless, it's a side effect of running systemd (or rather, systemd-logind).
loginctl enable-linger systemctl --user enable podman.socket loginctl enable-linger <USER>
?Just a random thought, but if you can create a user on the host that just has the most minimal set of systemd services enabled your container needs, you could apply it to that user.
But still, on a server that wouldn't make much sense imho, as the default user is usually the service user having a minimal set of services enabled. On a desktop, your default user is logged in anyways. So I think this isn't a real problem tbh.
The only non-deprecated way of having your Compose services restart automatically is with Quadlet files which are systemd unit files with extra options specific to containers. You need to manually translate your docker-compose.yml into one or more Quadlet files. Documentation for those leaves a lot to be desired too, it's just one huge itemized man page.
Same as for docker, yes?
Networking is just better in podman.
That page does not address rootless Docker, which can be installed (not just run) without root, so it would not have the ability to clobber firewall rules.
In order to stop these attacks, restrict outbound connections from unknown / not allowed binaries.
This kind of malware in particular requires outbound connections to the mining pools. Others downloads scripts or binaries from remote servers, or try to communicate with their c2c servers.
On the other hand, removing exec permissions to /tmp, /var/tmp and /dev/shm is also useful.
Sadly that's more of a duck tape or plaster, because any serious malware will launch their scripts with the proper '/bin/bash /path/to/dropped/payload' invocation. A non-exec mount works reasonably well only against actual binaries dropped into the paths, because it's much less common to launch them with the less known '/bin/ld.so /path/to/my/binary' stanza.
I've also at one time suggested that Debian installer should support configuring a read-only mount for /tmp, but got rejected. Too many packaging scripts depend on being able to run their various steps from /tmp (or more correctly, $TMPDIR).
If this weren't the case, plenty of containers could probably have a fully read-only filesystem.
- Configuration management (ansible, salt, chef, puppet)
- Preconfigured images (NixOS, packer, Guix, atomic stuffs)
For a one-off: pssh
You can also restrict outbound connections to cryptomining pools and malicious IPs. For example by using IOCs from VirusTotal or urlhaus.bazaar.ch
On ufw systems, I know what to do: if a port I need open is blocked, I run 'ufw allow'. It's dead simple, even I can remember it. And if I don't, I run 'ufw --help' and it tells me to run 'ufw allow'.
Firewall-cmd though? Any time I need to punch a hole in the firewall on a Fedora system, I spend way too long reading the extremely over-complicated output of 'firewall-cmd --help' and the monstrous man page, before I eventually give up and run 'sudo systemctl disable --now firewalld'. This has happened multiple times.
If firewalld/firewall-cmd works for you, great. But I think it's an absolutely terrible solution for anyone whose needs are, "default deny all incoming traffic, open these specific ports, on this computer". And it's wild that it's the default firewall on Fedora Workstation.
With ufw gui you need a single checkbox - block incoming connections.
Perhaps if you're doing more complicated things like bridging interfaces or rerouting traffic it would be more difficult to use than the alternatives, but for a simple whitelist it's extremely easy to configure and modify.
This sounds like great news. I followed some of the open issues about this on GitHub and it never really got a satisfactory fix. I found some previous threads on this "StrictForwardPorts": https://news.ycombinator.com/item?id=42603136.
It’s not like iptables was any better, but it was more intuitive as it spoke about IPs and ports, not high-level arbitrary constructs such as zones and services defined in some XML file. And since firewalld uses iptables/nftables underneath, I wonder why do I need a worse leaky abstraction on top of what I already know.
I truly hate firewalld.
I’d love a Linux firewall configured with a sane config file and I think BSD really nailed it. It’s easy to configure and still human readable, even for more advanced firewall gateway setups with many interfaces/zones.
A have no doubt that Linux can do all the same stuff feature-wise, but oh god the UX :/
I have been using for many decades both Linux and FreeBSD, on many kinds of computers.
When comparing Linux with FreeBSD, I probably do not find anything more annoying on Linux than its networking configuration tools.
While I am using Linux on my laptops and desktops and on some servers with computational purposes, on the servers that host networking services I much prefer FreeBSD, for the ease of administration.
Imagine container A which exposes tightly-coupled services X and Y. Container B should be able to access only X, container C should be able to accesd only Y.
For some reason there just isn't a convenient way to do this with Docker or Podman. Last time I looked into it, it required having to manually juggle the IP addressed assigned to the container and having the service explicitly bind to it - which is just needlessly complicated. Can firewalls solve this?
And ufw supports nftables btw. I think the real lesson is write your own firewalls and make them non-permissive - then just template that shit with CaC.
I mean there are some payload over payload like GRE VPE/VXLAN/VLAN or IPSec that needs to be written in raw nft if using Foomuuni but it works!.
But I love the Shorewall approach and your configuration gracefully encapsulated Shorewall mechanic.
Disclaimer: I maintain vim-syntax-nftables syntax highlighter repo at Github.
I'm not even sure what to say, or think, or even how to feel about the frontend ecosystem at this point. I've been debating on leaving the whole "web app" ecosystem as my main employment ventures and applying to some places requiring C++. C++ seems much easier to understand than what ever the latest frontend fad is. /rant
Compare that to what we had in the late 00's and early 10's we went through prototype -> mootools -> jquery -> backbone -> angularjs -> ember -> react, all in about 6 years. Thats a new recommended framework every year. If you want to complain about fads and churn, hop on over to AI development, they have plenty.
Pick a solid technology (.NET, Java, Go, etc...) for the backend and use whatever you want for your frontend. Voila, less CVEs and less churn!
Unless you're running a static html export - eg: not running the nextjs server, but serving through nginx or similar
> If your app’s React code does not use a server, your app is not affected by this vulnerability. If your app does not use a framework, bundler, or bundler plugin that supports React Server Components, your app is not affected by this vulnerability.
https://react.dev/blog/2025/12/03/critical-security-vulnerab...
So if you have a backend that supports RSC, even if you don't use it, you can still be vulnerable.
GP said they only shipped front ends but that can mean a lot.
Edit:link
https://nvd.nist.gov/vuln/detail/CVE-2025-29927
That plus the most recent react one, and you have a culture that does not care for their customers but rather chasing fads to help greedy careers.
Unfortunately, there is no way to specify those `emptyDir` volumes as `noexec` [1].
I think the docker equivalent is `--tmpfs` for the `emptyDir` volumes.
Having such a nice layered buildsystem with mountpoints, I'm amazed Docker made readonly an afterthought.
While LX-branded zones were a really cool tech demo, maintaining compatibility with Linux long-term would be incredibly painful and you're bound to find all sorts of horrific bugs in production. I believe that Oxide uses KVM to run their Linux guests.
Linux has always supported nested namespaces and you can run Docker containers inside LXC (or Incus) fairly easily. Note that while it does add some additional protection (in particular, it transparently adds user namespaces which is a critical security feature most people still do not enable in Docker) it is still the same technology as containers and so kernel bugs still pose a similar risk.
There are way more important things like actually knowing that you are running software with widely known RCE that don't even use established mechanisms to sandbox themselves it seems.
The way the author describes docker being the savior appears to be sheer luck.
Good security is layered.
Just use a firewall.
The firewall is there as a safeguard in case a service is temporarily misconfigured, it should certainly not be the only thing standing between your services and the internet.
I suggest people fuck around and find out, just limit your exposure. Spin up a VPS with nothing important, have fun, and delete it.
At some point we are all unqualified to use the internet and we used it anyway.
No one is going to die because your toy project got hacked and you are out $5 in credits, you probably learned a ton in the process.
I guess you can have the appserver fully firewalled and have another bastion host acting as an HTTP proxy, both for inbound as well as outbound connections. But it's not trivial to set up especially for the outbound scenario.
In this model, hosts don’t need any direct internet connectivity or access to public DNS. All outbound traffic is forced through the proxy, giving you full control over where each host is allowed to connect.
It’s not painless: you must maintain a whitelist of allowed URLs and HTTP methods, distribute a trusted CA certificate, and ensure all software is configured to use the proxy.
I know port scanners are a thing but the act of using non-default ports seems unreasonably effective at preventing most security problems.
Its security through obscurity, which puts you out of view of the vast majority of the chaos of the internet. It by no means protects you from all threats.
I did docker pull a few times base on some webpost (looks reasonable) and detect app/scripts from inside the docker connect to some .ru sites immediately or a few days later....
I do it for a really long time already, and until now I am not sure if it has any benefit or it's just umbrella in a sideways storm.
I don't think it's wrong, it's just not the same as eg using a yubikey.
My sshd only listens on the VPN interface
App servers run docker, with images that run a single executable (no os, no shell), strict cpu and memory limits. Most of my apps only require very limited temporary storage so usually no need to mount anything. So good luck executing anything in there.
I used, way back in the day, to run Wordpress sites. Would get hacked monthly every possible way. Learned so much, including the fact that often your app is your threat. With Wordpress, every plugin is a vector. Also the ability to easily hop into an instance and rewrite running code (looking at you scripting languages incl JS) is terrible. This motivated my move to Go. The code I compiled is what will run. Period.
Is that the case, though? My understanding was, that even if I run a docker container as root and the container is 100% compromised, there still would need to be a vulnerability in docker for it to “attack” the host, or am I missing something?
The core of the problem here is that process isolation doesn't save you from whole classes of attack vectors or misconfigurations that open you up to nasty surprises. Docker is great, just don't think of it as a sandbox to run untrusted code.
Of course if you have a kernel exploit you'd be able to break out (this is what gvisor mitigates to some extent), nothing seems to really protect against rowhammer/memory timing style attacks (but they don't seem to be commonly used). Beyond this, the main misconfigurations seem to be too wide volume bindings (e.g. something that allows access to the docker control socket from inside the container, or an obviously stupid mount like mounting your root inside the container).
Am I missing something?
Docker is pretty much the same but supposedly more flimsy.
Both have non-obvious configuration weaknesses that can lead to escapes.
While I generally agree with the technical argument, I fail to see the threat model here. Is it that some external threat would have prior knowledge that an important target is in close proximity to a less hardened one? It doesn't seem viable to me for nation states to spend the expensive R&D to compromise hobbyist-adjacent services in a hope that they can discover more valuable data on the host hypervisor.
Once such expensive malware is deployed, there's a huge risk that all the R&D money is spent on potentially just reconnaissance.
Attacker now needs a Docker exploit and then a VM exploit before getting to the hypervisor (and, no, pwning the VM ain't the same as pwning the hypervisor).
Not only does it allow me to partition the host for workloads but I also get security boundaries as well. While it may be a slight performance hit the segmentation also makes more logical sense in the way I view the workloads. Finally, it's trivial to template and script, so it's very low maintenance and allows for me to kill an LXC and just reprovision it if I need to make any significant changes. And I never need to migrate any data in this model (or very rarely).
but will not stop serious malware
Second, even if your Docker container is configured properly, the attacker gets to call themselves root and talk to the kernel. It's a security boundary, sure, but it's not as battle-tested as the isolation of not being root, or the isolation between VMs.
Thirdly, in the stock configuration processes inside a docker container can use loads of RAM (causing random things to get swapped to disk or OOM killed), can consume lots of CPU, and can fill your disk up. If you consider denial-of-service an attack, there you are.
Fourthly, there are a bunch of settings that disable the security boundary, and a lot of guides online will tell you to use them. Doing something in Docker that needs to access hot-plugged webcams? Hmm, it's not working unless I set --privileged - oops, there goes the security boundary. Trying to attach a debugger while developing and you set CAP_SYS_PTRACE? Bypasses the security boundary. Things like that.
Unfortunately, user namespaces are still not the default configuration with Docker (even though the core issues that made using them painful have long since been resolved).
non necessary vulnerability per. se. Bridged adapter for example lets you do a lot - few years ago there were a story of something like how a guy got a root in container and because the container used bridged adapter he was able to intercept traffic of an account info updates on GCP
I disagree with other commenters here that Docker is not a security boundary. It's a fine one, as long as you don't disable the boundary, which is as easy as running a container with `--privileged`. I wrote about secure alternatives for devcontainers here: https://cgamesplay.com/recipes/devcontainers/#docker-in-devc...
The only serious company that I'm aware of which doesn't understand that is Microsoft, and the reason I know that is because they've been embarrassed again and again by vulnerabilities that only exist because they run multitenant systems with only containers for isolation
Its all turtles, all the way down.
But for a typical case, VM's are the bare minimum to say you have a _secure_ isolation boundary because the attack surface is way smaller.
https://support.broadcom.com/web/ecx/support-content-notific...
https://nvd.nist.gov/vuln/detail/CVE-2019-5183
https://nvd.nist.gov/vuln/detail/CVE-2018-12130
https://nvd.nist.gov/vuln/detail/CVE-2018-2698
https://nvd.nist.gov/vuln/detail/CVE-2017-4936
In the end you need to configure it properly and pray there's no escape vulnerabilities. The same standard you applied to containers to say they're definitely never a security boundary. Seems like you're drawing some pretty arbitrary lines here.
Imagine naming this executable "ls" or "echo" and someone having "." in their path (which is why you shouldn't): as long as you do "ls" in this directory, you've ran compromised code.
There are obviously other ways to get that executable to be run on the host, this just a simple example.
OTH if I had written such a script for linux I'd be looking to grab the contents of $(hist) $(env) $(cat /etc/{group,passwd})... then enumerate /usr/bin/ /usr/local/bin/ and the XDG_{CACHE,CONFIG} dirs - some plaintext credentials are usually here.
The $HOME/.{aws,docker,claude,ssh}
Basically the attacker just needs to know their way around your OS. The script enumerating these directories is the 0777 script they were able to write from inside the root access container.
Go and Rust tend to lend themselves to these more restrictive environments a bit better than other options.
Are you holding millions of dollars in crypto/sensitive data? Better assume the machine and data is compromised and plan accordingly.
Is this your toy server for some low-value things where nothing bad can happen besides a bit of embarrassment even if you do get hit by a container escape zero-day? You're probably fine.
This attack is just a large-scale automated attack designed to mine cryptocurrency; it's unlikely any human ever actually logged into your server. So cleaning up the container is most likely fine.
Thanks for mentioning it - but now... how does one deal with this?
* but if you’re used to bind-mounting, they’ll be a hassle
Edit: This is by no means comprehensive, but I feel compelled to point it out specifically for some reason: remember not to mount .git writable, folks! Write access to .git is arbitrary code execution as whoever runs git.
You might still want to tighten things up. Just adding on the "rootless" part - running the container runtime as an unprivileged user on the host instead of root - you also want to run npm/node as unprivileged user inside the container. I still see many defaulting to running as root inside the container since that's the default of most images. OP touches on this.
For rootless podman, this will run as a user with your current uid and map ownership of mounts/volumes:
podman run -u$(id -u) --userns=keep-idAlso, if you've been compromised, you may have a rootkit that hides itself from the filesystem, so you can't be sure of a file's existence through a simple `ls` or `stat`.
Honestly, citation needed. Very rare unless you're literally giving the container access to write to /usr/bin or other binaries the host is running, to reconfigure your entire /etc, access to sockets like docker's, or some other insane level of over reach I doubt even the least educated docker user would do.
While of course they should be scoped properly, people act like some elusive 0-day container escape will get used on their minecraft server or personal blog that has otherwise sane mounts, non-admin capabilities, etc. You arent that special.
And a shocking number of tutorials recommend bind-mounting docker.sock into the container without any warning (some even tell you to mount it "ro" -- which is even funnier since that does nothing). I have a HN comment from ~8 years ago complaining about this.
"CVE-2025-66478 - Next.js/Puppeteer RCE)"
Now, JS sites everywhere are getting hacked.
JS has turned into the very thing it strived to replace.
It's good to see developers haven't changed. ;)
That said, do you have an image of the box or a container image? I'm curious about it.
I was lucky in that my DB backups were working so all my persistence wax backed up to S3. I think I could stand up another one in an hour.
Unfortunately I didn't keep an image no. I almost didn't have the foresight to investigate before yeeting the whole box into the sun!
From the root container, depending on volume mounts and capabilities granted to the container, they would enumerate the host directories and find the names of common scripts and then overwrite one such script. Or to be even sneakier, they can append their malicious code to an existing script in the host filesystem. Now each time you run your script, their code piggybacks.
OTOH if I had written such a script for linux I'd be looking to grab the contents of $(hist) $(env) $(cat /etc/{group,passwd})... then enumerate /usr/bin/ /usr/local/bin/ and the XDG_{CACHE,CONFIG} dirs - some plaintext credentials are usually here. The $HOME/.{aws,docker,claude,ssh} Basically the attacker just needs to know their way around your OS. The script enumerating these directories is the 0777 script they were able to write from inside the root access container.
Deleting and remaking the container will blow away all state associated with it. So there isn't a whole lot to worry about after you do that.
If you have decent network or process level monitoring, you're likely to find it, while you might not realize the vulnerable software itself or some stealthier, more dangerous malware that might exploit it.