Things Unix can do atomically (2010)

Posted by onurkanbkrc 18 hours ago

Things Unix can do atomically (2010)(rcrowley.org)

239 points | 89 commentspage 2

MintPaw 17 hours ago|

Not much apparently, although I didn't know about changing symlinks, that could be very useful.

jeffbee 8 hours ago||

I wonder why the author left out atomic writes with O_APPEND.

ozgrakkurt 8 hours ago||

This requires O_SYNC and O_DIRECT afaik.

Even then it is only some file systems that guarantee it and even then file size updating isn’t atomic afaik.

Not so sure about file size update being atomic in this case but fairly sure about the rest.

Matklad had some writing or video about this.

Also there is a tool called ALICE and authors of that tool have a white paper about this subject.

Also there was a blog post about how badger database fixed some issues around this problem.

jeffbee 7 hours ago||

I don't think any part of your post is right. Aside from NFS, there should not be filesystems where this doesn't work. If there are, those are just bugs. The flags you mentioned are not required or relevant. Setting the fd offset to the end of the file atomically is the entire purpose of O_APPEND.

ozgrakkurt 7 hours ago||

It depends on what you mean by atomic. If it is only writing to page cache and you are writing a small amount then yes?

If there is a failure like a crash or power outage etc. then it doesn’t work like that.

You might as well be pushing into an in-memory data structure and writing to disk at program exit in terms of reliability

jeffbee 7 hours ago||

You are projecting imaginary features onto O_APPEND and then hypothesizing that your imaginary features might not work.

POSIX says that for a file opened with O_APPEND "the file offset shall be set to the end of the file prior to each write." That's it. That's all it does.

zbentley 8 hours ago||

Unsure. Aren’t there filesystems which make O_APPEND less durable than it’s specified to be, which might be interpreted to adversely affect atomicity? Could that be it?

andrewstuart 14 hours ago||

Anywhere there is atomic capability you can build a queuing application.

ta8903 16 hours ago||

Not technically related to atomicity, but I was looking for a way to do arbitrary filesystem operations based on some condition (like adding a file to a directory, and having some operation be performed on it). The usual recommendation for this is to use inotify/watchman, but something about it seems clunky to me. I want to write a virtual filesystem, where you pass it a trigger condition and a function, and it applies the function to all files based on the trigger condition. Does something like this exist?

zbentley 8 hours ago||

The challenge with that approach is memory: trigger conditions, if added irresponsibly, can result in unbounded memory and (depending on implementation) potentially linear performance degradation of filesystem operations as well. Unbounded kernel memory growth leads to stability or security risks.

That tradeoff is at the root of why most notify APIs are either approximate (events can be dropped) or rigidly bounded by kernel settings that prevent truly arbitrary numbers of watches. fanotify and some implementations of kqueue are better at efficiently triggering large recursive watches, but that’s still just a mitigation on the underlying memory/performance tradeoffs, not a full solution.

laz 12 hours ago|||

Sounds half baked. What context does this function run in? Is it an interpreted language or an executable that you provide?

Inotify is the way to shovel these events out of the kernel, then userspace process rules apply. It's maybe not elegant from your pov, but it's simple.

quesera 11 hours ago|||

I've used FUSE for something similar.

There are sample "drivers" in easily-modified python that are fast enough for casual use.

direwolf20 14 hours ago|||

are you asking for if statements?

if(condition) {do the thing;}

ta8903 13 hours ago||

I know this is trivial to do programmatically, but I was looking for a way this will be handled by the filesystem. For instance, if I have some processes generating log files, and I have a script that converts them to html, I wanted the script to be called every time a log file is updated, without having a daemon running in the background to monitor the directory, just some filesystem mount. This would have made some deployments easier.

Brian_K_White 16 hours ago||

incron

ta8903 13 hours ago||

Thanks, I didn't find this when I was looking for a solution for my problem. This is pretty much the exact solution for my usecase, though for some reason inotify feels more complicated than some kind of filesystem mount solution for me.

maximgeorge 16 hours ago||

[dead]

klempner 15 hours ago||

This document being from 2010 is, of course, missing the C11/C++11 atomics that replaced the need for compiler intrinsics or non portable inline asm when "operating on virtual memory".

With that said, at least for C and C++, the behavior of (std::)atomic when dealing with interprocess interactions is slightly outside the scope of the standard, but in practice (and at least recommended by the C++ standard) (atomic_)is_lock_free() atomics are generally usable between processes.

senderista 5 hours ago|

That's right, atomic operations work just fine for memory shared between processes. I have worked on a commercial product that used this everywhere.

exac 17 hours ago|

Sorry, there is zero chance I will ever deploy new code by changing a symlink to point to the new directory.

silisili 6 hours ago||

I don't do devops/sysadmin anymore, so this would have been before the age of k8s for everything. But I once interviewed for a company hiring specifically because their deployment process lasted hours, and rollbacks even longer.

In the interview when they were describing this problem, I asked why the didn't just put all of the new release in a new dir, and use symlinks to roll forward and backwards as needed. They kind of froze and looked at each other and all had the same 'aha' moment. I ended up not being interested in taking the job, but they still made sure to thank me for the idea which I thought was nice.

Not that I'm a genius or anything, it's something I'd done previously for years, and I'm sure I learned it from someone else who'd been doing it for years. It's a very valid deployment mechanism IMO, of course depending on your architecture.

sholladay 17 hours ago|||

Why? What do you prefer to do instead?

gib444 16 hours ago||

Anything less than an entire new k8s cluster and switching over is just amateur hour obviously

1718627440 9 hours ago|||

Isn't that the standard way to do that? Why wouldn't you?

iberator 17 hours ago|||

why? it works and its super clever. Simple command instead some shit written in JS with docker trash

lloeki 17 hours ago||

Ah, the memories of capistrano, complete with zero-downtime unicorn handover

https://github.com/capistrano/capistrano/

10us 16 hours ago||

Still use php deployer each day and works with symlinks as well. https://deployer.org/

bandrami 16 hours ago|||

Works pretty well for Nix

atmosx 15 hours ago|||

Worked pretty well in production systems, serving huge amount of RPS (like ~5-10k/s) running on a LAMP stack monolith in five different geographical regions.

Just git branch (one branch per region because of compliance requirements) -> branch creates "tar.gz" with predefined name -> automated system downloads the new "tar.gz", checks release date, revision, etc. -> new symlink & php (serverles!!!) graceful restart and ka-b00m.

Rollbacks worked by pointing back to the old dir & restart.

Worked like a charm :-)

mananaysiempre 16 hours ago|||

And for Stow[1] before it, and for its inspiration Depot[2] before even that. It’s an old idea.

[1] https://www.gnu.org/software/stow/

[2] http://ftp.gregor.com/download/dgregor/depot.pdf

bandrami 16 hours ago||

I really liked stow. My toy distro back in the day was based on it.

slopusila 16 hours ago|||

that's how some phone OSes update the system (by having 2 read only fs)

that's how Chrome updates itself, but without the symlink part

dizhn 15 hours ago|||

No snapshotting at all? Thinking about it.. The filesystem does not support it I suppose.

LiamPowell 15 hours ago||

Android does use snapshots: https://source.android.com/docs/core/ota/virtual_ab

dizhn 12 hours ago||

Oh cool. I was a bit confused about not using snapshots and relying on symlinks but it couldn't be so simple. I guess it's just a simple userspace cow mount. https://source.android.com/docs/core/ota/virtual_ab#compress...

x4132 16 hours ago|||

not surprised about the chrome part, but pretty shocked at the phone OS part. I know APFS migration was done in this way, but wouldn't storage considerations for this be massive?

slopusila 15 hours ago|||

what would be more massive would be phones not booting up because of a botched update. this way you can just switch back to the old partition

marmarama 15 hours ago|||

Not really, because only the OS core is swapped in this way. Apps and data live in their own partitions/subvolumes, which are mutable and shared between OS versions.

The OS core is deployed as a single unit and is a few GB in size, pretty small when internal storage is into the hundreds of GB.

gonzus 13 hours ago|||

Then you are locking yourself out of a pretty much ironclad (and extremely cost-effective) way of managing such things.

alpb 17 hours ago||

Nobody's saying you should deploy code with this, but symlinks are a very common filesystem locking method.