Posted by tomasol 22 hours ago
Different workflows should probably go in different buckets or "topics" for clarity. Since it's distributed, the system must guarantee that the log items are stored in the same ordering ("offsets") among the nodes.
Not a bad way to do things.
One reason why a "logs are all you need" solution may fail: untrusted-log-as-injection[1].
Check those SBOM, and don't forget to include their CICD pipelines[2].
[1] https://news.ycombinator.com/item?id=48315440
[2] https://github.com/jqwik-team/jqwik/issues/708#issuecomment-...
A distributed WAL (to survive a machine death) would also probably be something I'd want, and … something I'm not sure you're getting directly from SQLite.
I am joining a new project and need to know to what extent Kafka is still a part of the future for new big data projects. It doesn't seem like there are alternatives at the high end but instead the question is when other technologies (that are easier to manage, require less compute, etc.) max out.
"Sockets are all you need for durable workflows" and then finally "Kernel primitives are all you need for durable workflows."
But seriously, part of being a professional is using the right tool for the job.
[0]: https://pglite.dev/
https://sqlite.org/datatype3.html
https://www.postgresql.org/docs/current/datatype.html
Working with date/time feels like using a 30years old database, nothing is enforced at insert. Really someone needs to explain why so many people like it.
The somewhat good: it gets rid of most of the weak typing. It still coerces, in line with other SQL databases, but at least a column will only store values of one type. Personally I’d prefer to opt out of the coercion. And I don’t think most ways of writing SQL (in applications especially, but also manually) will ever actually trigger the strict differences. So it doesn’t feel like it’s actually particularly useful.
The distinctly bad: you’re limited to six datatype names. You may well now want external documentation or load-bearing comments in your schema, and your application code may be hobbled, if it liked to infer types based on the datatype name. For example, in sqlx, SQLite datatype BOOLEAN can automatically map to Rust type bool <https://github.com/transact-rs/sqlx/blob/75bc0487eb661da811b...>. Without that, you have to resort to a variety of less-pleasant techniques, such as selecting `done as "done: bool"` or overriding things in sqlx.toml.
I really, really wish they’d implement some form of CREATE TYPE and let that work with strict tables. If I could `CREATE TYPE BOOLEAN FROM INTEGER` and such, I’d be all in on strict tables.
create table events (
id integer primary key,
name text not null,
event_date text not null check (
-- YYYY-MM-DD
event_date glob '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]'
and date(event_date) is not null
and date(event_date) = event_date
)
);
In Python that raises this error if the date is invalid: sqlite3.IntegrityError: CHECK constraint failed:
event_date glob '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]'Python would show the 1st line always? Or the failed part?
This is unreasonable for a very common type I think.
PRAGMA journal_mode = WAL
PRAGMA foreign_keys = ON
# Something non-null
PRAGMA busy_timeout = 1000
# This is fine for most applications, but see the manual
PRAGMA synchronous = NORMAL
# If you use it as a file format
PRAGMA trusted_schema = OFF
You might need additional options, depending on the binding. E.g. Python applications should not use the defaults of the sqlite3 module, which are simply wrong (with no alternative except out-of-stdlib bindings pre-3.12): https://docs.python.org/3/library/sqlite3.html#transaction-c...Also use strict tables. https://www.sqlite.org/stricttables.html
While it has bad ergonomics, you can also use CHECK constraints. For example, using sqlite's built in date support, it's possible but awkward:
CHECK (
date(my_date_col) IS NOT NULL
AND my_date_col = date(my_date_col)
)
The IS NOT NULL is needed because date returns NULL for invalid dates; the other check because it also accepts Julian days (date('2026') is sometime during year 4707 BC).I agree it is disappointing, especially before strict tables.
You should check out DuckDB which is basically SQLite but with proper types. Although it is also OLAP (struct of arrays) rather than OLTP (array of structs) which may have worse performance for typical SQLite loads. In practice I doubt it matters if you have an application where you're considering either.
maintain an in-memory SQLite db and work it with SQL commands, and if you also want to preserve state across application restarts you can routinely save to disk or load from it: <https://www.sqlite.org/backup.html#example_1_loading_and_sav...>
this also happens to be the most convenient file-format (aka. application-format) I ever worked with.
Check it out here: https://unmeshed.io
Another fascinating fact: our countries TLD has been stolen Ocean's 11 style (I am not kidding). After Czechoslovakia split into Czech Republic and Slovak Republic, the newly created Slovak .sk TLD has been under the care of people from the local university. The university also had some offices that they were leasing out. Someone had leased this office space (EDIT: this is important as this means they had the same physical address), created a company that had the same name as the NGO that was taking care of the domain, so e.g. the NGO was named "My Company o.z." and the perpetrator created a "My Company s.r.o." (our countries version of the american Ltd). This person then wrote to ICANN to change the address to the "My Company s.r.o." presumably under the pretense that this was just an administrative error and from this point, they have functionally taken custody of the TLD. I was not able to find how they did it technically, but I presume they persuaded ICANN to then point to their servers instead of the real ones. After this happened, it seems that no one noticed for some time. When they noticed, they tried taking it back, but they weren't able to. For some inexplicable reason, the government during that time (Šuster era, early 2000s) gave the new company a contract that was functionally uncancellable from the government side. Later governments made this even more uncancellable and in 2017, then Minister of IT (and as of this day president!) Pellegrini made the contract literally uncancellable. As a result of this, we have one of the most expensive domains around (18e/year, rising each year for no good reason). (EDIT:) The company running our countries TLD is now a foreign entity that the whole thing has been sold to (multiple owners over time) and we as a country have no control over if I understand it correctly.
I might have gotten some details wrong as I am writing this from my memory of researching it a couple of years back, but you get the idea, crazy stuff. Here is an article in Czech [0] that tells the story a bit better, but you have to translate it.
[0] https://www.root.cz/clanky/pribeh-domeny-sk-aneb-kradez-za-b...
// EDIT: I have found that the article actually links the movement to return the TLD back [1]. It also has a story tab [2], so they have something much more precise than the paraphrasing I wrote.
Obligatory list of workflow engines and libraries because it's such a common need that a lot have rolled their own. [1]
[0] https://docs.dbos.dev/python/tutorials/database-connection