Little nitpick there, consensus anti-scales. You add more nodes and it gets slower. The rest of the section on rqlite and dqlite makes sense though, just not about "scale".
I've changed the wording to "They are designed to keep a set of stateful nodes that maintain connectivity to one another in sync.". Thank you!
From an extremely brief scan, it appears that Erlang wrappers around SQLite should be able to use the Graft SQLite extension just fine.
Alternatively, it would be reasonably straight forward to wrap Graft Client (Rust library) directly in an Erlang NIF using something like https://github.com/rusterlium/rustler
Let's make it happen! :)
Much appreciated, I'll keep you informed :)
First, a caveat: Graft currently has no permissions. Anyone with access to the Graft PageStore and MetaStore can read/write to any volume. This is obviously going to change - so I'll talk about what's planned rather than what exists. :)
For writes, Graft can support fairly granular permission models. This is an advantage of handling writes in the PageStore. Depending on the data being stored in a Volume, a future PageStore version could reject writes based on inspecting the uploaded pages. This would increase the load on the PageStore, but since it's designed to run on the edge and horizontally scale like crazy (stateless) it seems like it would work.
Reads, on the other hand, are a lot more tricky. The simplest approach is to partition data across Volumes such that you can enforce read permissions at the Volume level. This isn't a great solution and will certainly limit the kinds of workloads that are well aligned with Graft. A more complex approach is to layer Volumes. Effectively virtualizing a single database that internally writes rows to different layers depending on access permissions. This second approach offers a slightly nicer user experience, at the cost of complexity and query performance.
For now though, Graft is best suited to workloads that can partition data and permissions across Volumes.
As an example, let's say your building something like Google Sheets on top of Graft. Each document would be an independent Volume. This matches how Sharing works in Google Sheets, as each user added to the Volume could either be granted read or write permissions to the entire sheet.
It seems like your solution requires essentially sharding your data based on permission which can get pretty complicated for many collaboration-based apps with lots of shared content and granular permission controls.
Either way, it's 100% a great idea that I'd like to explore. If any DuckDB contributors are reading I'd love to know if it would work!
Looks really good, great work!
Could you theoretically use it for e.g. DuckDB? (maybe not now, but with some work further down the line) What about a graph db like KuzuDB? or is it SQL only?
I've filed an issue for both:
Our system is map based; so we are dealing with a lot of map content that is updating often (e.g. location tracking).
v0 of our system was a failed attempt at using mongo realm before I joined. One of my first projects as the CTO of this company was shaking my head at that attempt and unceremoniously deleting it. It was moving GB of data around for no good reason (we only had a few hundred records at that point), was super flaky/buggy at that point, and I never liked mongo to begin with and this was just a mess that was never going to work. We actually triggered a few crash bugs in mongo cloud that caused data loss at some point. Probably because we were doing it wrong (somehow) but it made it clear to me that this was just wrong at many levels. The key problem of realm was that it was a product aimed at junior mobile developers with zero clue about databases. Not a great basis to start engineering a scalable, multi user system that needs to store a world full of data (literally, because geospatial).
We transitioned to a system that used a elasticsearch based system to query for objects to show on a map. Doing that all the time gets expensive so we quickly started thinking about caching objects locally. v1 one of that system served us for about two years and was based on a wasm build of sql lite together with some experimental sqldelight (a kotlin multiplatform framework). This worked surprisingly well given the alpha state of the ecosystem and libraries. But there are some unrelated gotchas when you want to package things up as a PWA, which requires being a bit strict on security model in the browser and conflicting requirements for OPFS (one of the options for local storage). Particularly Safari/IOS is a bit picky on this front. We got it working but it wasn't nice.
At some point I decided to try indexeddb and just get rid of a lot of complexity. IndexedDB is an absolutely horrible Javascript API piece of sh*. But with some kotlin coroutine wrappers, I got it to do what I wanted and unlike OPFS it just works pretty much in all browsers. Also it has similarly relaxed storage quota so you should be able to cram tens/hundreds of MB of data in there without issues (any more might work but is probably not a great idea for sync performance reasons). It's querying is much more limited. But it works for our mostly simple access pattern of getting and storing stuff by id only and maybe doing some things with timestamps, keyword columns, etc.
If somebody is interested, I put a gist here with the kotlin file that does all the interaction with indexed db: https://gist.github.com/jillesvangurp/c6923ac3c6f17fa36dd023...
This is part of another project that I'm working on that will be OSS (MIT license) at some point that I parked half a year ago. I built that first and then decided to lift the implementation and use it on my map product (closed source). Has some limitations. Transactional callback hell is a thing that I need to tackle at some point. Mostly you use it like a glorified Map<String, T> where T is anything that you can convert to/from json via kotlinx serialization.
We're currently working on adding geospatial filtering so we can prioritize the areas the user is using and delete area they are not using. We have millions of things world wide (too much to fetch) but typical usage focuses on a handful of local areas. So we don't need to fetch everything all the time and can get away with only fetching a few tens/hundreds of things. But we do want caching, personalization, and real time updates from others to be reflected. And possibly offline support later. So, the requirements around this are complicated.
We're actually deprioritizing local storage because after putting our API on a diet we don't actually fetch that much data without caching. A few KB on map reposition, typically; the map tiles are larger.
Offline is something that generates a lot of interest from customers but that's mostly because mobile networks suck in Germany.