One of the things that puts metaprogramming on wheels is being able to pull in grammars, schemas and such for external systems. Think how many cool tools we could have if we could do this in a more general way.
That's what I've been trying to do with: https://github.com/schemamap/schemamap
For a well-constrained Postgres schema, a deterministic SQL compiler can be built (with plenty of logic programming/constraint-solving/hypergraph-analysis) that can integrate arbitrary external systems data.
While this is NP-hard, with some clever use of embeddings and ontologies, and use of every single constraint AST within the DB to reduce the search space, this becomes a feasible problem.
For any Clojurists interested, I've packaged `pg_query`, so you can use it in your applications: https://github.com/schemamap/pg-query-clj
At the moment I'm saving up for the next development cycle, so not only PG->PG schema mappings can be solved for (JSON schema is next!). Hope this sounds interesting :)
1: e.g. https://github.com/1Password/onepassword-sdk-go/blob/v0.2.1/...
Supabase's LSP also uses tree-sitter for corrections and autocomplete, because one drawback of using the server's source is that pg_query only works on well-formed/executable SQL - when it detects a malformed query, it formulates an error and exits, since that's what you want in an SQL server. So for partially-correct syntax and fill-forward, tree-sitter covers the gaps.
Over the years we've only had to maintain a small set of patches on top of the Postgres source [0], together with some mocks and our libclang-based extraction logic [1]. Of course it would be nice if Postgres just packaged this directly like it packages the client libraries, but there is non-trivial effort involved to do that. From what I recall, the main issue is that error handling and memory allocations work differently in the client-side libraries (and so that would have to either also be moved out of the backend source, or use some other abstraction).
[0]: https://github.com/pganalyze/libpg_query/tree/17-latest/patc...
[1]: https://github.com/pganalyze/libpg_query/blob/17-latest/scri...
Re: pgcat and longevity, it's actually the opposite. Pgcat is/was an open source project that I worked on in my spare time. Thankfully I found a couple engineers (and the awesome community contributions) at large corps to push it over the line.
PgDog is a startup, so this is now my full time job.
You should mention the timeline of projects leading up to PgDog in your marketing. I was looking for references to pgcat (very briefly) but didn’t see any. Your background will bring a lot of peace of mind to technology adopters in big enterprise.
Additionally, maybe this is just a pet peeve with Postgres documentation in general, but the docs use "database" to mean both a single database and a Postgres server interchangeably. On some pages this makes the content harder to understand. I think it's good practice to distinguish between databases and "clusters" (no, not those clusters, actually servers), and perhaps between clusters and groups of servers. While some of the naming is unfortunate for historical reasons, re-using the same names for different concepts just causes more confusion.
Medium term I can see detecting foreign keys automatically (if constraints exist).
If you have users and posts, with posts having a userid field referencing users.id, and you shard the users table on id, how does it behave?
Now I request posts for a user id, is it smart enough to map the posts field to the users field? Where are posts stored? What if one user has all of the posts, now my data will be unbalanced across shards. Can I shard on post.id? If so what are the implications for querying joined to user?
The naive answer is to break foreign key integrity in the database. But it specifically says that pgdog doesn’t require sacrificing data safety.
And the fact that this works at all is another example of why Postgres is the default data store on the internet. It's battle tested and has a massive ecosystem of support tools, like this one.
I'm impressed both that the Rust ecosystem had the right library available and that it was high enough quality code for the author's team to use it as a reference for building their own additional functionality.
[1] https://github.com/pganalyze/libpg_query/blob/15-latest/READ...
I am wondering, why it took so long for something like this to happen for Postgres, where as Vitess existed for MySQL more than a decade. I am not an expert, but I hear Citus is not really a fair comparison with Vitess.
We were thinking about adding PostgreSQL support to Vitess several years ago. However, we realized that it would be more efficient if each tool was focused on a specific task.
https://github.com/vitessio/vitess/issues/7084#issuecomment-...
That’s today. The project is developing fast, so I am sure more things will be added :)
[1] https://docs.rs/pg_query/6.0.0/pg_query/protobuf/struct.Sele...
The neon approach of decoupling storage from processing but keeping the processing a query local to one server seems better to me, but I am maybe missing something.
I know it sounds silly/crazy but I have a use case where I would like to allow "mostly" trusted users to access the database directly and not through an API, but I need to apply some restrictions on their output.
One thing I was thinking of doing is generating query plans asynchronously and blocking/cancelling queries that would otherwise be expensive and cause downtime. That's on the roadmap.
This makes it sound like the query would be routed to all shards except the one matching HASH(25). But wouldn't id != 25 need to go to all shards, since there could be values in the id set that hash to the same shard as 25?