Posted by HunOL 2 days ago
You get SQLITE_BUSY when transaction #1 starts in read mode, transaction #2 starts in write mode, and then transaction #1 attempts to upgrade from read to write mode while transaction #2 still holds the write lock.
The fix is to set a busy_timeout and to begin any transaction that does a write (any write, even if it is not the first operation in the transaction) in “immediate” mode rather than “deferred” mode.
https://zeroclarkthirty.com/2024-10-19-sqlite-database-is-lo...
I have a python web app that creates a DB connection per request (not ideal I know) and immediately attaches 3 auxiliary DBs. This is a low traffic site but we have a serious reliability problem when load increases: the ATTACH calls occasionally fail with "database is locked". I don't know if this is because the ATTACH fails immediately without respecting the normal 5 second database timeout or what. To be honest I haven't implemented connection pooling yet because I want to understand what exactly causes this problem.
FWIW, "one per request per connection is bad" (for SQLite) is FUD, plain and simple. SQLite's own forum software creates one connection per request (it creates a whole forked process per request, for that matter) and we do not have any problems whatsoever with that approach.
Connection pools (with SQLite) are a solution looking for a problem, not a solution to a real problem.
Note that the application needs to be aware of that there are two pools — one for write operations and one for reads (the latter with no or high connection limit). The separation can be ensured on SQLite level too by adding ?_query_only=1 to connection parameters or setting the respective pragmas in the read-only pool.
Honestly, its the key to getting the most out of sqlite. It also allows for transaction batching and various other forms if batching that can massively improve write throughput.
It's that we need to contort our software to make sqlite not suck at writes that is the problem.
I await the write to complete before my next read in my application logic, same as any other bit of code that interacts with a database or does other IO. Just because another thread handles interacting with the writer connection, doesn't mean my logic thread just walks away pretending the write finished successfully in 0ms.
>Who knows when those writes you scheduled really get written
When a commit completes for a transaction, that transaction has been durably written. No mystery. That's true whether you decide to restrict writes to a single thread in your application or not.
Usually this is true but there are edge cases for certain journaled file systems. IIRC sqlite.org has a discussion on this.
Can't currently find it but I guess it comes under the "if the OS or hardware lies to SQLite, what can it do?" banner?
Dislocating DML from the code that triggers it creates many problems around ensuring proper data integrity and it divorces consistent reads of uncommitted data that you may want to tightly control before committing. By punting it to a dedicated writer you're removing the ability to ensure serialised modification of your data and the ability to cleanly react to integrity errors that may arise. If you don't need that? Go ahead. But it's not fud. We build relational acid compliant databases this way for a reason
I just meant that if you can structure your application to run write transactions in a single thread (the whole transaction and it's associated logic, not just deferring writing the end result to a separate thread) then you minimize contention at the SQLite level.
This becomes increasingly inefficient as contention increases, as you can easily get into a situation where everyone is sleeping, waiting for others, for a few milliseconds.
Ensuring all, or most, writes are serialized, improves this.
The default is DELETE mode, where the rollback journal is deleted at the conclusion of each transaction. What's more - in this mode (not-WAL), readers can coexist, but they do block the writer (which is always one) and the writer block readers - concurrency is highly limited.
In WAL mode - which pretty much always you should set - there's also at most one writer, but writer can coexist with readers.
Also this is because WAL mode (and I believe only for WAL mode, since there is really no concurrent reads in the other mode).
The reason is because pages in WAL mode appended to a single log file. Hence, if you read something inside a BEGIN transaction, later wants to mutate something else, there could be another page already appended and potentially interfere with the strict serializable guarantee for WAL mode. Hence, SQLite has to fail at the point of lock upgrade.
Immediate mode solves this problem because at BEGIN time (or more correctly, at the time of first read in that transaction), a write lock is acquired hence no page can be appended between read -> write, unlike in the deferred mode.
It's just weird that it's set to 0 by default rather than something resonable like 3000 or 5000 ms.
A similar design for SQLite would design for only one writer, with all other processes passing their SQL to it.
In WAL mode, writers and readers don’t interfere with each other, so you can still do pure read queries in parallel.
Only one writer is allowed at a time no matter what, so writers queue up and you have to take the write lock at some point anyway.
In general, it’s hard to say without benchmarking your own application. This will get rid of SQLITE_BUSY errors firing immediately in the situation of read/write/upgrade-read-to-write scenario I described, however. You’d be retrying the transactions that fail from SQLITE_BUSY anyway, so that retrying is what you’d need to benchmark against.
It’s a subtle problem, but I’d rather queue up writes than have to write the code that retries failed transactions that shouldn’t really be failing.
However, it screams of a broken implementation.
Imagine if Linux PAM logins randomly failed if someone else was concurrently changing their password or vice versa.
In no other application would random failures due to concurrency be tolerated.
SQLite is broken by design; the world shouldn’t give them a free pass.
We had some old Android tablets using our app 8 hours a day for 3-4 years. They'd complain if locking errors and slowness but every time they'd copy their data to send to us, we couldn't replicate, even on the same hardware. It wasn't until we bought one user a new device and got them to send us the old one that we could check it out. We thought maybe the ssd had worn out over the few years of continual use but installing a dev copy of our app was super fast. In the end what did work was to "defrag" the db file by copying it to a new location, deleting the original, then moving it back to the same name. Boom, no more "unable to open database" errors, no more slow downs.
I tried this on Jellyfin dbs a few months ago after running it for years and then suddenly running into performance issues, it made a big difference there too.
I would say that the much more common degradation is caused by write amplification due to a nearly full flash drive (or a flash drive that appears nearly full to the FTL because the system doesn't implement some TRIM-like mechanism to tell the FTL about free blocks). This generally leads to systemwide slowdown though rather than slowdown accessing just one particular file.
This was especially prevalent on some older Android devices which didn't bother to implement TRIM or an equivalent feature (which even affected the Google devices, like the Nexus 7).
My understanding of the parent reply's situation is that this was happening on the tablets of their users, so it kinda doesn't matter that it can be avoided by not using cheap tablets.
Most apps aren't in a position to tell their users that they are on their own when they run into what feels like an unreasonable app slowdown because they didn't buy a good enough device to run it on, especially when they've previously experienced it running just fine.
If all their apps feel like crap on that tablet, sure, that might fly... but if its only your app (or only a small set of apps that use SQLite in the same way the OP's company did) that feels like crap after a while, that's effectively a you problem (to solve) even if its not really a you problem.
In any case, its an interesting data point and could be very useful information to others who run into similar issues.
I'm merely saying that the root cause was misidentified - the performance degradation didn't happen due to fragmentation, but because the flash storage was degraded to the point where the write performance dropped significantly. This happens faster for eMMC vs. SSD-style storage.
Copying the DB file moved the data to different storage blocks which is why it (temporarily again) improved performance.
Sadly that's a common plague for cheaper Android hardware - after enough writes the flash performance drops off a cliff making those devices essentially unusable :/
(More expensive hardware - including Apples - tends to have UFS type storage which lasts longer.)
The other workaround to get a speed boost was the user to uninstall and reinstall the app (and then wait for all the data to download again) but that didn't fly because the users would delete before they'd synced off all their data and then data would go missing.
This was all despite having VACUUM running whenever the app started.
Whether it was bad flash or no, we still had to try resolve it as the client wouldn't buy new hardware until we could prove that we had the knowledge to make the problem go away first :/
Is that even still a thing? I thought modern filesystems like ext4 were supposed to be largely immune to that.
The way ext4 reduces fragmentation is with some basic heuristics: mainly, it spreads files across the full disk instead of finding the next free spot. So they have room to grow without fragmenting. When the space gets low, it fragments just as badly as older file systems unfortunately.
https://sqlite.org/lang_vacuum.html
(Edit: if multiple processes are concurrently reading and writing, and one process vacuums, verify that the right things happen: specifically, that concurrent writes from other processes during a vacuum don’t get erased by the other processes’ vacuum. You may need an external advisory lock to avoid data loss).
This is not true. From the link you posted:
> The VACUUM command works by copying the contents of the database into a temporary database file and then overwriting the original with the contents of the temporary file.
I always get optimize and vacuum mixed up.
HN does not support whatever markup you are trying to use. You have to use Unicode:
“You can VACUUM INTO, b̶u̶t̶ ̶s̶t̶a̶n̶d̶a̶r̶d̶ ̶v̶a̶c̶u̶u̶m̶ ̶w̶o̶n̶’̶t̶ ̶r̶e̶w̶r̶i̶t̶e̶ ̶t̶h̶e̶ ̶w̶h̶o̶l̶e̶ ̶d̶b̶ (vacuum rewrites the whole db)”
Success, performance increase.
Failure, no change.
Shouldn't the file be moved into different disk fragment first, for that to happen?
Again: zip is a backup
> If your application fully manages this file, the assumption must be made that your application is the sole owner of this file, and nobody else will tinker with it while you are writing data to it.
Kind of, but sqlite does locking for you, so you don't have to do anything to ensure your process is the only one writing to the db file.
> [The WAL] allows multiple parallel writes to take place and get enqueued into the WAL.
The WAL doesn't allow multiple parallel writes. It just allows reads to be concurrent with a single write transaction.
You need a brace of PRAGMAs to get it to behave reasonably sanely if you do anything serious with it.
    PRAGMA foreign_keys=ON
    PRAGMA recursive_triggers=ON
    PRAGMA journal_mode=WAL
    PRAGMA busy_timeout=30000
    PRAGMA synchronous=NORMAL
    PRAGMA cache_size=10000
    PRAGMA temp_store=MEMORY
    PRAGMA wal_autocheckpoint=1000
    PRAGMA optimize <- run on tx start
Note that I do not use auto_vacuum for DELETEs are uncommon in my workflows and I am fine with the trade-off and if I do need it I can always PRAGMA it.defer_foreign_keys is useful if you understand the pros and cons of enabling it.
Except for long lived connections where you do it periodically.
https://www.sqlite.org/lang_analyze.html#periodically_run_pr...
In any case, there is no harm in setting sticky pragmas every connection.
for example? I'm surprised by the downvotes. Using mmap significantly reduced my average read query time; durations about 70% the length!
https://sqlite.org/compile.html#recommended_compile_time_opt...
The SQLite team also has 2 branches that address concurrency that may someday merge to trunk, but by their very nature they are quite conservative and it may never happen unless they feel it passes muster.
https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co... https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
As to the problem that prompted the article, there's another way of addressing the problem that is kind of a kludge but is guaranteed to work in scenarios like theirs: Have each thread in the parallel scan write to it's own temporary database and then bulk import them once the scan is done.
It's easy to get hung up on having "a database" but sharding to different files by use is trivial to do.
Another thing to bear in mind with a lot of SQLite use cases is that the data is effectively read only save for occasional updates. Read only databases are a lot easier to deal with regarding locking.
It’s the classic OLAP (DuckDB) vs OLTP (SQLite) trade off between the two. DuckDB is very good at many things but most applications that need a traditional SQL DB will probably not perform well if you swap it over to DuckDB.
What I remember about our evaluation of DuckDB in 2024 concluded that (1) the major limitations were lack of range-scan and index-lookup performance (maybe w/ joins? or update where?), and (2) the DuckDB Node.js module segfaulted too much. Perhaps the engineers somehow missed the ART index it could also be the restriction that data fit in memory to create an index on it (our test dataset was about 50gb)
"A lot easier" sounds like an understatement. What's there to lock when the data is read only?
I presume the `hc` part in project's code name should be High Concurrency.
[1] https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
  So, I decided on three locking strategies:
  No-Lock
  Optimistic locking
  Pessimistic locking
  As a default, the no-lock behavior does exactly what the name implies. Nothing. This is the default because my research shows that for 99% all of this is not an issue and every interaction at this level will slow down the whole application.
Aren't the mutexes in the more modern implementations (like Cosmo [0]) & runtimes (like Go [1]) already optimized so applications can use mutexes fearlessly?This is something that I think I could fairly easily ameliorate if I could simply load-balance the application server by user, but historically (with Emby), I've not been able to do that due to SQLite locking not allowing me to run multiple instances pointing to the same config instance.
There's almost certainly ways to do this correctly with SQLite but if they allowed for using almost literally any other database this would be a total non-issue.
ETA:
For clarification if anyone is reading this, all this media LEGALLY OBTAINED with PERMISSION FROM THE COPYRIGHT HOLDER(S).
I don't quite get the "in software" part. I assume you mean that the video needs to be transcoded to h.264 on your server for their client to play it.
The way I mostly solved this is to ask people to install and use the native app (jellyfin-media-player or Android app) whenever possible, as it is compatible with more codecs.
You can also configure HW acceleration for transcoding, a decent GPU should have no trouble encoding a few h.264 streams in real time.
And lastly, you can play with distributed versions of ffmpeg, since Jellyfin calls ffmpeg. There are multiple options, such as https://hub.docker.com/r/bitwrk/jellyfin-rffmpeg (I never used it myself, though).
I'll look into the distributed ffmpeg.
>[...] it also opens up new possibilities - not officially yet, but soon - for running Jellyfin backed by "real" database systems like PostgreSQL, providing new options for redundancy, load-balancing, and easier maintenance and administration. The future looks very bright!
You have to at least have at least a slight idea about the specifics, from different types of vacuum to how it behaves in low memory conditions. The idea that docker has something to do this is a misdirection at best.
And if you think sqlite has many knobs and special modes, wait until you hear about Postgres.
And why do you think I think that?
SQLite is probably the better option here and in most places where you want portability though.
The effort required to put an application on Kubernetes is a pretty good indicator of software quality. In other words, I can have a pretty good idea about how difficult a software is to maintain in a single-instance configuration by trying to port it to Kubernetes.
[1] https://jellyfin.org/posts/efcore-refactoring [2] https://jellyfin.org/posts/jellyfin-release-10.11.0/
Now maybe you could have an abstraction layer over your storage layer that supports multiple data stores, including a distributed one. But that comes with tradeoffs, like being limited to the least common denominator of features of the data stores, and having to implement the abstraction layer for multiple data stores.
> Distributed systems have many failure modes that you don't have to worry about in non-distributed systems.
Yes, but as previously mentioned, those failure modes are handled by abiding a few simple principles. It’s also worth noting that multiprocess or multithreaded software have many of the same failure modes, including the one discussed in this post. Architecting systems as though they are distributed largely takes care of those failure modes as well, making even single-node software like Jellyfin more robust.
> Now maybe you could have an abstraction layer over your storage layer that supports multiple data stores, including a distributed one. But that comes with tradeoffs, like being limited to the least common denominator of features of the data stores, and having to implement the abstraction layer for multiple data stores.
Generally I just target storage interfaces that can be easily distributed—things like Postgres (or maybe dqlite?) for SQL databases or an object storage API instead of a filesystem API. If you build a system like it could be distributed one day, you’ll end up with a simpler, more modular system even if you never scale to more than one node (maybe you just want to take advantage of parallelism on your single node, as was the case in this blog post).
But as I mentioned above, that makes the system more complicated for people who don't need it to be distributed.
Setting up separate db software, configuring the connection, handling separate updates, etc. is a lot more work for most users than Jellyfin just using a local embedded sqlite database. And it would probably make the application code more complicated as well.
You can package a Postgres database with your app just like SQLite. Users should not have to know that they are using Postgres much less configuring connections, handling updates, etc.
> And it would probably make the application code more complicated as well.
Not at all, this is an article about the hoops the application has to jump through to make SQLite behave well with parallel access. Postgres is designed for parallel access by default. It’s strictly simpler from the perspective of the application.
You technically can. But that is much more difficult to do than including sqlite, and isn't how postgresql was meant to be used. And what happens when you want to upgrade the major version of postgresql? Do you now include two versions of postgresql so that you can convert old databases to the new postgresql format? I certainly wouldn't say it is "just like SQLite".
My point is to treat it like software from that lineage and you won't have a problem, trying to treat it like something it's not, like a distributed web app, will lead to issues.
I already had to do that for my authoritative PG deployment, and my media manager shouldn't require a full RDBMS.
Using SQLite for Jellyfin has made running it wherever really, really easy, same thing with doing backups and lazy black box debugging.
You probably need to support this for your testsuite anyway.
I think the author od this article missed sqlite_busy.
Once you do have it set up correctly, are handling a single writer at the application level and have litestream set up your off to the races assuming your app can scale on a single box (it most likely can).
It's like saying "oh, you want to visit Austrian country side next month and you're asking for advice for best tent? How about you build a cabin instead?".
Jellyfin is by far the least reliable application I run, but it also seems to be best in class.
Emby has a scarily-ancient install process, but it's been working just fine with less hassle.
A stateless design where a stateless jellyfin server talks to a postgres database would be simpler and more robust.
I certainly don’t mind if someone is pushing the limits of what SQLite is designed for but personally I’d just rather invest the (rather small) overhead of setting up a db server if I need a lot of concurrency.
We're having troubles with memory usage when using SQLite in-memory DBs with "a lot" of inserts and deletes. Like maybe inserting up to a 100k rows in 5 minutes, deleting them all after 5 minutes, and doing this for days on end. We see memory usage slowly creeping up over hours/days when doing that.
Any settings that would help with that? It's particularly bad on macOS, we've had instances where we reached 1GB of memory usage according to Activity Monitor after a week or so.
However... what you (and OP) are looking for might be pragma shrink_memory [1].