Posted by thunderbong 3 days ago
I don't think I've ever done much with SQLite's JSON functions, but I have on one or two occasions used a constraint to enforce a TEXT column contains valid JSON, which would have been very tedious to do otherwise.
I very, very much did. I was using a Python package that used a lot of NumPy internally, and sometimes its return values would be Python integers, and sometimes they'd be NumPy integers.
The Python integers would get written to SQLite as SQLite integers. The NumPy integers would get written to SQLite as SQLite binary blobs. Preventing you from doing simple things like even comparing for equal values.
Setting to STRICT caused an error whenever my code tried to insert a binary blob into an integer column, so I knew where in the code I needed to explicitly convert the values to Python integers when necessary.
On the STRICT mode, I've asked this elsewhere and never gotten an answer: does anyone have a loose-typing example application where SQLite's non-strict, different-type-allowed-for-each-row has been a big benefit? I love the simplicity of SQLite's small number of column types, but the any-type-allowed-anywhere design always seemed a little strange.
I copied the idea for the ->> operator from PostgreSQL. But in PostgreSQL, the ->> operator always returns a text rendering of the value from the JSON, even if the value is really an integer or floating point number. PG is rigidly typed, so that's all it can do. But SQLite is flexibly typed, so the ->> operator can return anything - text, integer, floating-point, NULL - whatever value if finds in the JSON.
This flexibility is convenient when only one application reads and writes to the table. But if multiple applications access the same tables, the lack of a strictly enforced schema becomes a liability. The same is true when using generic tools to process data in SQLite tables, because such tools don't know what type of data to expect. The column type may be X but the actual data may be of type Y.
Ideally, I'd like to be able to specify the stored type (or at least, side step numeric affinity), and give the type a name (for introspection, documentation).
Specifying that a column is a DATETIME, a JSON, or a DECIMAL is useful, IMO.
Alas, neither STRICT nor non-STRICT tables allow this.
Sqlite came from TCL which is all strings. https://www.tcl-lang.org/
An example of where this would be a benefit is if you stored date/times in different formats (changing as an app evolved.)
My preference would be for SQLite to actually support commonplace data types. But as long as it doesn't, I can see the appeal in using the schema to specify what data you're storing in your database.
I think the naive approach is to tokenize the input and append "*" for prefix matching. I'm not too experienced and this can probably be improved a lot. There are many settings like different tokenizers, stemming, etc. Additionally, a lot can be built on top like weighting, boosting exact matches, etc.
Does anyone know good resources for this to learn and draw inspiration from?
Is there a reason why something more custom built, like ParadeDB Community edition won't meet your needs?
I understand you're speaking about SQLite, while ParadeDB is PostgreSQL but as you know, it's non-trivial to get good search quality, so I'm trying to understand your situation and needs.
O'Reilly's "Relevant Search" isn't the worst here, but you'll be porting/writing a bit yourself.
They recently landed multi-writer support for their rust SQLite re-implementation, which is personally the biggest issue I've had with using SQLite for high concurrency applications.
`PRAGMA journal_mode = 'mvcc';`
https://docs.turso.tech/tursodb/concurrent-writes
Very excited to see if SQLite responds by adding native support, I'm hoping competition here will spur improvements on both sides.
Curious if others here are running SQLite in production and if you would be interested in something like this.
SELECT
settings -> '$.languages' languages
FROM
user_settings
WHERE
settings ->> '$.languages' LIKE '%"en"%';
I use them heavily with my jekyll-sqlite projects. See https://github.com/blr-today/website/blob/main/_config.yml#L... for example.Granted it was small DB (few megabytes), I also wanted to avoid collecting changes one by one, I simply wanted a diff over last time.
And ON CONFLICT which can help dedupe among other things in a simple and performant way.
with thing_key as (
insert into item(key, description) values('thing', 'a thing') on conflict do nothing )
insert into user_note(uid, key, note) values (123, 'thing', 'I like this thing') on conflict (uid, thing) do update set note = 'I like this thing');Does anyone have any other favorite modern bloom-filter-based search solutions that dont need to store copies of all the documents in the search db? Ideally something that can run in WASM too so we can ship a tiny search index to the browser. I found https://github.com/tinysearch/tinysearch but haven't tried it yet.