Using SQLite as storage for web server static content

Posted by ajayvk 9 hours ago

Using SQLite as storage for web server static content(clace.io)

146 points | 88 commentspage 2

Szpadel 8 hours ago|

I don't get argumentation, swapping "current" symlink to point to another version worked for years as atomic way to swap 2 site versions

using sqlite for static content might have other benefits, but i don't think that this is the main one

ajayvk 8 hours ago|

Swapping symlinks is possible. Using a database (sqlite specifically) has other benefits, like being able to do deduplication, backups are easier, compressed content can be stored, content hash can be stored etc.

webstrand 7 hours ago|||

Sqlite isn't necessarily easier to backup than a filesystem. I've got a fairly large (~60GB) sqlite that I'm somewhat eager to get off of. If I'd stuck with pure filesystem then backing up only the changeset would be trivial, but with sqlite I have to store a new copy of the database.

I've tried various solutions like litestream and even xdelta3 (which generates patches in excess of the actual changeset size), but I haven't found a solution I'm confident in other than backing up complete snapshots.

fanf2 7 hours ago|||

You might like the new sqlite3_rsync command https://www.sqlite.org/rsync.html

simonw 7 hours ago||

Yeah that looks ideal for this exact problem, because it lets you stream a snapshot backup of a SQLite over SSH without needing to first create a duplicate copy using .backup or vacuum. My notes here: https://til.simonwillison.net/sqlite/compile-sqlite3-rsync

webstrand 7 hours ago||

Maybe that tool just doesn't fit my use-case, but I'm not sure how you'd use it to do incremental backups? I store all of my backups in S3 Glacier for the cheap storage, so there's nothing for me to rsync onto.

I can see how you'd use it for replication though.

simonw 4 hours ago||

If you want incremental backups to S3 I recommend Litestream.

hedgehog 2 hours ago|||

What do you do about compaction?

theturtle32 5 hours ago||||

You could also employ a different filesystem like ZFS or btrfs in tandem with the symlink-swapping strategy to achieve things like deduplication. Or, once you have deduplication at the filesystem level, just construct a new complete duplicate of the folder to represent the new version and use renaming to swap the old for the new, and poof -- atomic changes and versioning with de-duplication, all while continuing to be able to use standard filesystem paradigms and tools.

borsecplata 8 hours ago||||

Deduplication can be achieved the same way as in sqlite, by keeping files indexed by sha256. There are also filesystems who provide transparent compression.

Seeing as you need some kind of layer between web and sqlite, you might as well keep a layer between web and FS who nets you most or all of the benefits.

warble 8 hours ago|||

All of this is easily done on a filesystem too. I would assume this is a performance tradeoff rather than features?

MathMonkeyMan 8 hours ago||

This gives me an idea.

Use git to manage versions of the static files.

Use [git worktree][1] to have multiple separate working trees in different directories.

Use [openat()][2] in your app whenever serving a request for a static file. On each request, open the directory, i.e. `/var/www/mysite.com`, and then use `openat()` with the directory's file descriptor when looking up any files under that directory.

`/var/www/mysite.com` is a symbolic link to some working tree.

When modifying the site, make modifications to the git repository. Then check out the modified working tree into a new directory.

Create a symbolic link to the new directory and `mv` (rename) it to `/var/www/mysite.com`.

Then some time later, `rm -r` the old working tree.

[1]: https://git-scm.com/docs/git-worktree

[2]: https://pubs.opengroup.org/onlinepubs/9799919799/functions/o...

mistrial9 8 hours ago|

> Use git to manage versions of the static files

bad match -- git stores every version in the history forever. Do you really need every revision of a binary file, completely duplicated? big files, more bad

MathMonkeyMan 6 hours ago||

Hm, good point. I suppose you could use [more git commands][1] to enforce a "nothing older than N commits" policy. But now the solution lacks that dead simple allure.

[1]: https://www.reddit.com/r/git/comments/wk2kqy/delete_files_un...

fsiefken 5 hours ago||

I read "There is no equivalent implementation using the file system to compare against, so a direct benchmark test is not done." But Btrfs and ZFS have file versioning built-in, not just snapshots, but also individual files - this can be used to make a similar setup, without using binary blobs, but just the filesystem references in sqlite or hard links - which might even perform faster.

ajayvk 3 hours ago|

I meant there is no Clace implementation which uses the file system instead of the database. That would be required for a direct benchmark. For dev mode apps, Clace loads files from disk, but that dev mode has many differences from the prod mode

ajayvk 9 hours ago||

I have been building https://github.com/claceio/clace, a project which makes it easier to manage web apps for internal use (locally and across a team). SQLite is used to store static files. This post talks about the reasoning behind that.

redleader55 8 hours ago||

This is interesting as a toy and I'm sure the authors are having a lot of fun implementing their idea.

On the other hand, the state of art when it comes to performance is zero-copy from disk to NIC using io_uring and DMA, combined with NICs that support TLS & checksumming offloading and separated header and payload that can be filled independently by the OS and user-space.

I wonder if the authors of these projects ask themselves: what reasons are there to not do it like this? This thread has a few answers.

ajayvk 7 hours ago|

I am the author (of the blog post and of the Clace project). Clace is built for one specific use case: internal web apps, to run locally or on a shared server, for use by a team. Clace is not a general purpose web server. It is an application server which uses containers and sandboxing (when running Starlark code) to allow for multiple apps to be managed easily. Clace just happens to implement some web server features for operational simplicity, to avoid requiring an additional component to be managed.

I am not claiming a SQLite database is always the right approach for serving static files. In my particular use case for Clace, I found a database to work better.

jazzyjackson 6 hours ago||

Isn't there bottleneck at your egress - I thought sqlite was fast at reads but only handles one read at a time - so if someone is downloading a gigabyte binary out of your db every other connection just has to wait - am I wrong?

Something I want to try is using sqlite as my filesystem but just storing content hashes that would point to an S3-compatible object store, so you get the atomicity and rollbackability and all but you also get massive parallelization and multiregionality of an object store

Edit: I googled it again and find that multiple processes can read sqlite concurrently so shouldn't be a problem

https://stackoverflow.com/questions/4060772/sqlite-concurren...

catlifeonmars 6 hours ago|

There’s still a bottleneck in that you have to host it on the same machine and egress from the same network interface. Classic tradeoff between consistency and availability though.

osigurdson 4 hours ago||

While I like the idea of using sqlite, how to avoid downtime? Do you host the file on NFS or something?

slavboj 5 hours ago||

"We put a filesystem in your database on your filesystem which is a database, so you can serve while you serve while you serve." [inception theme plays]

zzzeek 8 hours ago||

> When updating an app, since there could be lots of files being updated, using a database would allow all changes to be done atomically in a transaction. This would prevent broken web pages from being served during a version change.

but....the SQLite file is locked against reads while writing in order to achieve serializable isolation. which sort of indicates you're better off doing your database work in an offline file, then just swapping the new file for the old one that's in production. which sort of indicates....just use a tar file, or a separate directory that you swap in to update the new content.

it is so much easier to serve static files statically rather than from some program that's trying to manage live SQLite connections and achieve some weird "update concurrency" magic when this problem is not at all that hard to solve. It's fine to manage your CMS in a sqlite database but when you serve the content live and it's static, use static files.

okl 8 hours ago|

> but....the SQLite file is locked against reads while writing in order to achieve serializable isolation.

Not with WAL: https://www.sqlite.org/wal.html

simonw 8 hours ago||

And even without WAL (which you should absolutely be using if you're serving web content with SQLite) the lock for most writes lasts for a tiny fraction of a second.

resoluteteeth 3 hours ago|||

I might be misremembering, but if you're using a transaction like in the article but using the rollback journal mode rather than WAL, won't sqlite actually hold the lock on the database for the entire time until the transaction is committed, which might actually be a substantial amount of time if you're writing lots of blobs like in the article even if each individual blob doesn't take that long?

zzzeek 7 hours ago|||

small writes, which is still a dramatically larger pause than simply copying a few files to a directory and not pausing anything. if the website update is hundreds of large files, then the SQLite write is going to be large also. it then comes down to, "is it faster to copy 200M of files to a filesystem or write 200M of new data to BLOBs in a single monolithic SQLite file?" I'd bet the former in that race

okl 6 hours ago||

YMMV depending on the exact setup but SQLite is usually faster for many small BLOBs than the filesystem, https://www.sqlite.org/fasterthanfs.html

hinkley 6 hours ago|

I want an nginx plugin that does this. And for a cache.

More comments...