Modern SQLite: Features You Didn't Know It Had

Posted by thunderbong 3 days ago

Modern SQLite: Features You Didn't Know It Had(slicker.me)

239 points | 63 comments

aaviator42 3 days ago|

SQLite is insanely robust. I have developed websites serving hundreds of thousands of daily users where the storage layer is entirely handle by SQLite, via an abstraction layer I built that gives you a handy key-value interface so I don't have to craft queries when I just need data storage/retrieval: https://github.com/aaviator42/StorX

arunix 2 days ago|

Does StorX have any special handling of concurrent writes, or would the user need to take care of that?

aaviator42 1 day ago||

We use SQLite IMMEDIATE transactions, which lock files for writes for a few milliseconds while commiting data to the file. This is not a problem in practice until you reach more than dozens of concurrent writers. StorX configures a default busy timeout of 1.5s, but it can be configured as per your needs. You can also get a lot more out of it by being smart about how you spread your data over DB files (eg: one file per user instead of one for multiple/all users), and also by considering when you call openFile() and closeFile() (eg: keep write transactions short, don't leave a file handler open while running long calculations).

krylon 3 days ago||

STRICT tables are something I appreciate very much, even though I cannot recall running into a problem that would have prevented by its presence in the before-time. But it's good to have all the same.

I don't think I've ever done much with SQLite's JSON functions, but I have on one or two occasions used a constraint to enforce a TEXT column contains valid JSON, which would have been very tedious to do otherwise.

crazygringo 3 days ago|

> even though I cannot recall running into a problem that would have prevented by its presence in the before-time

I very, very much did. I was using a Python package that used a lot of NumPy internally, and sometimes its return values would be Python integers, and sometimes they'd be NumPy integers.

The Python integers would get written to SQLite as SQLite integers. The NumPy integers would get written to SQLite as SQLite binary blobs. Preventing you from doing simple things like even comparing for equal values.

Setting to STRICT caused an error whenever my code tried to insert a binary blob into an integer column, so I knew where in the code I needed to explicitly convert the values to Python integers when necessary.

QuadrupleA 3 days ago||

Love SQLite and most of these features.

On the STRICT mode, I've asked this elsewhere and never gotten an answer: does anyone have a loose-typing example application where SQLite's non-strict, different-type-allowed-for-each-row has been a big benefit? I love the simplicity of SQLite's small number of column types, but the any-type-allowed-anywhere design always seemed a little strange.

SQLite 3 days ago||

Flexible typing works really well with JSON, which is also flexibly typed. Are you familiar with the ->> operator that extracts a value from JSON object or array? If jjj is a column that holds a JSON object, then jjj->>'xyz' is the value of the "xyz" field of that object.

I copied the idea for the ->> operator from PostgreSQL. But in PostgreSQL, the ->> operator always returns a text rendering of the value from the JSON, even if the value is really an integer or floating point number. PG is rigidly typed, so that's all it can do. But SQLite is flexibly typed, so the ->> operator can return anything - text, integer, floating-point, NULL - whatever value if finds in the JSON.

lateforwork 3 days ago|||

When your application's design changes, you may need to store a slightly different type of data. Relational databases traditionally require explicit schema changes for this, whereas NoSQL databases allow more flexible, schema-less data. SQLite sits somewhere in between: it remains a relational database, but its dynamic typing allows you to store different types of values in a column without immediately migrating data to a new table.

This flexibility is convenient when only one application reads and writes to the table. But if multiple applications access the same tables, the lack of a strictly enforced schema becomes a liability. The same is true when using generic tools to process data in SQLite tables, because such tools don't know what type of data to expect. The column type may be X but the actual data may be of type Y.

ncruces 3 days ago|||

Not necessarily, but being able to specify types beyond those allowed by STRICT tables is useful.

Ideally, I'd like to be able to specify the stored type (or at least, side step numeric affinity), and give the type a name (for introspection, documentation).

Specifying that a column is a DATETIME, a JSON, or a DECIMAL is useful, IMO.

Alas, neither STRICT nor non-STRICT tables allow this.

irq-1 3 days ago|||

> the any-type-allowed-anywhere design always seemed a little strange.

Sqlite came from TCL which is all strings. https://www.tcl-lang.org/

An example of where this would be a benefit is if you stored date/times in different formats (changing as an app evolved.)

spiffytech 3 days ago||

I remember an ORM that uses fake column types like `boolean` and `datetime` because SQLite doesn't enforce anything. That way the ORM knows how to deserialize the data. Strict mode prohibits this by only accepting column types SQLite recognizes.

My preference would be for SQLite to actually support commonplace data types. But as long as it doesn't, I can see the appeal in using the schema to specify what data you're storing in your database.

kherud 3 days ago||

SQLite seems very powerful for building FTS (user enters free text, expects high precision/recall results). Still, I feel like it's non-trivial to get good search quality.

I think the naive approach is to tokenize the input and append "*" for prefix matching. I'm not too experienced and this can probably be improved a lot. There are many settings like different tokenizers, stemming, etc. Additionally, a lot can be built on top like weighting, boosting exact matches, etc.

Does anyone know good resources for this to learn and draw inspiration from?

subhobroto 3 days ago||

> Does anyone know good resources for this to learn and draw inspiration from?

Is there a reason why something more custom built, like ParadeDB Community edition won't meet your needs?

I understand you're speaking about SQLite, while ParadeDB is PostgreSQL but as you know, it's non-trivial to get good search quality, so I'm trying to understand your situation and needs.

fizx 3 days ago||

I mean you can use sqlite as an index and then rebuild all of Lucene on top of it. It's non-trivial to build search quality on top of actual search libraries too.

O'Reilly's "Relevant Search" isn't the worst here, but you'll be porting/writing a bit yourself.

nikisweeting 3 days ago||

Surprised no one has mentioned Turso yet!

They recently landed multi-writer support for their rust SQLite re-implementation, which is personally the biggest issue I've had with using SQLite for high concurrency applications.

`PRAGMA journal_mode = 'mvcc';`

https://docs.turso.tech/tursodb/concurrent-writes

Very excited to see if SQLite responds by adding native support, I'm hoping competition here will spur improvements on both sides.

ncruces 3 days ago|

Incredible that a database company writes that page and doesn't document the isolation level of the feature.

nikisweeting 3 days ago||

I believe the full docs are here: https://github.com/tursodatabase/turso/blob/main/docs/manual...

101008 3 days ago||

Not sure if people interested, but since I use sqlite in a lot of my own projects, I am working on a lightweight monitoring and safety layer for production SQLite. The idea is pretty simple: SQLite is amazing, but once it’s running in production you basically have zero observability. If something weird happens (unexpected writes, schema changes, background jobs touching tables, etc.) you only find out after the fact. It tries to solve that without touching application code. It's a Rust agent that runs next to your sqlite file, and connects to the server where everything is logged in. My current challenge right now is encryption and trust, mostly.

Curious if others here are running SQLite in production and if you would be interested in something like this.

mandeepj 3 days ago|

checkout https://newrelic.com/instant-observability/sqlite

captn3m0 3 days ago||

You can shorten your JSON queries using arrow notation in sqlite.

    SELECT
    settings -> '$.languages' languages
    FROM
    user_settings
    WHERE
    settings ->> '$.languages' LIKE '%"en"%';

I use them heavily with my jekyll-sqlite projects. See https://github.com/blr-today/website/blob/main/_config.yml#L... for example.

malkia 3 days ago||

In the past I've used the backup API - https://sqlite.org/backup.html - in order to load in memory a copy of sqlite db, and have another live one. I would do this after certain user action, and then by doing a diff, I would know what changed... I guess poor way of implementing PostgreSQL events... but it worked!

Granted it was small DB (few megabytes), I also wanted to avoid collecting changes one by one, I simply wanted a diff over last time.

faizshah 3 days ago||

Theres also spellfix1 which is an extension you can enable to get fuzzy search.

And ON CONFLICT which can help dedupe among other things in a simple and performant way.

somat 3 days ago|

I was trying to port a small program I wrote from postgres to a sqlite backend(mainly to make it easier to install) and was pleased to find out sqlite supported "on conflict" I was less pleased to find out that apperently I abuse CTE's to insert foreign keys all the time and sqlite was not happy doing that.

    with thing_key as (
    insert into item(key, description) values('thing', 'a thing') on conflict do nothing )

    insert into user_note(uid, key, note) values (123, 'thing', 'I like this thing') on conflict (uid, thing) do update set note = 'I like this thing');

FooBarWidget 3 days ago|

I've found FTSE5 not useful for serious fuzzy or subword full text search. For example I have documents saying "DaemonSet". But if the user searches for "Daemon" then there will be no results.

ers35 3 days ago||

Use the trigram tokenizer: https://www.sqlite.org/fts5.html#the_trigram_tokenizer

nikisweeting 3 days ago|||

I have found this as well, FTSE5 is convenient to have as an option, but it's not as versatile as postgres or sonic or other full-text search solutions.

Does anyone have any other favorite modern bloom-filter-based search solutions that dont need to store copies of all the documents in the search db? Ideally something that can run in WASM too so we can ship a tiny search index to the browser. I found https://github.com/tinysearch/tinysearch but haven't tried it yet.

yomismoaqui 3 days ago||

Doesn't this work ok?

https://www.sqlite.org/fts5.html#fts5_prefix_queries

More comments...