The talk contains a lot more details on how the transactional and replication layers of Graft work.
A hybrid approach is to detect slow syncing (for example when sync hasn't completed after 5 seconds), and instead send queries directly to the server because there is a good chance the task the user wants to complete doesn't depend on the bloated records.
This is why Graft isn't just focused on client sync. By expanding the focus to serverless functions and the edge, Graft is able to run the exact same workload on the exact same snapshot (which it can validate trivially due to its consistency guarantees) anywhere. This means that a client's workload can be trivially moved to the edge where there may be more resources, a better network connection, or existing cached state.
I mean, it's obviously about syncing stuff (despite the title), ok. It "simplifies the development", "shares data smoothly" and all the other nice things that everything else does (or claims to do). And I can use it to implement everything where replication of data might be useful (so, everything). Cool, but... sorry, what does it, exactly?
The biggest problem with syncing is, obviously, conflict resolution. Graft "doesn’t care about what’s inside those pages", so, obviously, it cannot solve conflicts. So if I'm using it in a note-taking app, as suggested, every unsynced change to a plain text file will result in a conflict. So, I suppose, it isn't what it's for at all, it's just a mechanism to handle replication between 2 SQLite files, when there are no conflicts between statements (so, what MySQL or Postgres do out of the box). Right? So, it will replace the standard SQLite driver in my app code to route all requests via some Graft-DB that will send my statements to external Graft instance as well as to my SQLite storage? Or what?
> Licensed under either of
> Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
> MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
> at your option.
I see this now and then, but it makes me wonder, why would I pick in this case Apache over MIT? Or is this software actually Apache licensed, but the developer is giving you greenlight to use it under the terms of the MIT? But at that point I don't get why not just license it all under MIT to begin with...
> The Apache license includes important protection against patent aggression, but it is not compatible with the GPL, version 2. To avoid problems using Rust with GPL2, it is alternately MIT licensed.
The Rust API guidelines also recommend the same: https://rust-lang.github.io/api-guidelines/necessities.html#...
[1]: https://github.com/dtolnay/rust-faq#why-a-dual-mitasl2-licen...
From my perspective (as the author of Graft) my goal was to be open source and as compatible as possible with the Rust ecosystem. Hence the choice to dual license.
As long as actions are immutable and any non-deterministic inputs are captured in the arguments they can be (re)executed in total clock order from a known common state in the client database to arrive at a consistent state regardless of when clients sync. The benefit of this I realized is that it works perfectly with authentication/authorization using postgres row level security. It's also efficient, letting clients sync the minimal amount of information and handle conflicts while still having full server authority over what clients can write.
There's a lot more detail involved in actually making it work. Triggers to capture row level patches and reverse patches in a transaction while executing an action. Client local rollback mechanism to resolve conflicts by rolling back local db state and replaying actions in total causal order. State patch actions that reconcile the differences between expected and actual outcomes of replaying actions (for example due to private data and conditionals). And so on.
The big benefits of this technique is that it isn't just merging data, it's actually executing business logic to move state forward. That means it captures user intentions where a system based purely on merging data cannot. Traditional crdt that merges data will end up at a consistent state but can provide zero guarantees about the semantic validity of that state to the end user. By replaying business logic functions I'm seeking to guarantee that the state is not only consistent but maximally preserves the intentions of the user when reconciling interleaved writes.
This is still a WIP and I don't have anything useful to share yet but I think the core of the idea is sound. Exciting to see so much innovation in the space of data sync! It's a tough problem and no solution (yet) handles the use cases of many different types of apps.
The actions in my prototype are just TS functions (actually Effects https://effect.website/ but same idea) that can arbitrarily read and write to the client local database. This does put some restrictions on the app -- it has to define all mutations inside of actions and capture any non-deterministic things other than database access (random number, time, network calls, etc) as part of the arguments. Beyond that what an app does inside of the actions can be entirely arbitrary.
I think that hits the sweet spot between flexibility, simplicity, and consistency. The action functions can always handle divergence in whatever way makes sense for the application. Clients will always converge to the same semantically valid state because state is always advanced by business logic, not patches.
Patches are recorded but only for application to the server's database state and for checking divergence from expected results when replaying incoming actions on a client. It should create very little load on the backend server because it does not need to execute action functions, it can just apply patches with the confidence that the clients have resolved any conflicts in a way that makes the most sense.
It's fun and interesting stuff to work on! I'll have to take a closer look at SQLSync for some inspiration.
Any chance your project is public? I'd love to dig into the details. Alternatively would you be willing to nerd out on this over a call sometime? You can reach me at hello [at] orbitinghail [dotdev]
Using effect really doesn't introduce anything special -- plain async typescript functions would work fine too. My approach is certainly a lot less sophisticated than yours with Graft or SQLSync! I just like using Effect. It makes working with typescript a lot more pleasant.
Beta ETA?
I'm curious if the graft solution helps with this. The idea of just being able to ship a sqlite db to a mobile client that you can also mutate from a server is really powerful. I ended up basically building my own syncing engine to sync changes between clients and servers.
As for the more general question though, by shipping pages you will often ship more data than the equivalent logical replication approach. This is a tradeoff you make for a much simpler approach to strong consistency on top of arbitrary data models.
I'd love to learn more about the approach you took with your sync engine! It's so fun how much energy is in the replication space right now!
Basically, Graft is not the full story, but as you point out - because it's so simple it's easy to build different solutions on top of it.
[^1]: either across Volumes or across pages, Graft can handle automatically merging non-overlapping page sets