Top
Best
New

Posted by tosh 3 days ago

OrioleDB Patent: now freely available to the Postgres community(supabase.com)
408 points | 152 commentspage 2
dangoodmanUT 3 days ago|
> OrioleDB tables don't support SERIALIZABLE isolation level.

This is an unfortunate limitation to be aware of when evaluating

https://www.orioledb.com/docs/usage/getting-started#current-...

btown 3 days ago||
For what it's worth, this does appear to be just a temporary situation, as mentioned in that linked document and in code comments e.g. https://github.com/orioledb/orioledb/blob/7f3b3a9a8e195ba31f...
MrHamdulay 3 days ago||
that message was put there 2 years ago. soon may not be coming.
akorotkov 3 days ago|||
We will eventually add the SERIALIZABLE isolation level to OrioleDB, but right now that's not our highest priority. Let me explain why. At first, SSI (serializable snapshot isolation) in PostgreSQL comes with significant shortcomings, including.

1) Overhead. SSI implies a heavyweight lock on any involved index page or heap tuple (even for reads). The overhead of SSI was initially measured at ~10%, but nowadays, scalability has gone much farther. These many HW-locks could slow down in multiple times a typical workload on a multicore machine.

2) SSI needs the user to be able to repeat any transaction due to serialization failure. Even a read-only transaction needs to be DEFERRABLE or might be aborted due to serialization failure (it might "saw impossible things" and needs a retry).

In contrast, typically it's not hard to resolve the concurrency problems of writing transactions using explicit row-level and advisory locks, while REPEATABLE READ is enough for reporting. Frankly speaking, during my whole career, I didn't see a single case where SERIALIZABLE isolation level was justified.

tux3 3 days ago|||
I had a case just recently where we needed to enforce a no-overlap constraint too complicated to express in an index (recurring time ranges).

Any time you have to check constraints manually, you can't just do it before the write, or after the write, because two REPEATABLE READ write transactions will not see each other's INSERT.

You need something like a lock, a two-phase commit, or SERIALIZABLE isolation for writes. Advisory locks have sharp edges, and 2PC is not so simple either, there is a lot that can go wrong.

In the case of SERIALIZABLE you do need to retry in case of conflict, but usually the serialization anomalies can be limited to a reasonably fine level. And an explicit retry feels safer than risking a livelock situation when there is contention.

akorotkov 2 days ago||
Exclusion constraints should work well for this case. On any transaction isolation level.

https://www.postgresql.org/docs/current/ddl-constraints.html...

tux3 2 days ago||
That's indeed the first thing you would want to try. In this case these time ranges can repeat according to a formula, like a cron job for example.

An exclusion constraint needs concrete values to compare, but here we can't pre-compute and index every future value (there are infinitely many)

We solve a diophantine equation for this check (if there is a solution to Ax - By = 0, then formulas A and B can conflict at some point)

btown 2 days ago||
This seems like a fascinating problem domain! I'm incredibly curious what product you're working on - if you're taking this much attention to detail on recurring events, it seems like it's a step beyond most scheduling systems I've seen.

Heck, https://www.sciencedirect.com/science/article/pii/S147466701... and https://www.amazon.com/Declarative-Models-Concurrent-Cyclic-... seem to indicate this is still an area of active research. And, to your point, almost certainly too complex to try to encode into a Postgres index - that would be a paper-worthy project unto itself!

(I've had to implement the naive approach here, which nested-loops over dates and rules and finds ones that might conflict. Definitely not meant to scale, and would be a nightmare if not at a date-level granularity!)

No pressure to share, but would love to learn more!

tux3 2 days ago||
It's a scheduling system in the medical field where recurring events can happen in a particular room, this might be every N days at a particular time, or every three weeks on Tuesday and Friday between some start date and some end date.

I'm not entirely familiar with the literature, so we did some simplifications to keep it manageable, and it's very possible there might be a better approach!

We model the events generated by our recurring rules as intervals [SA+n*A, SA+n*A+DA), where A is a constant integer number of days (e.g. A=3*7 for an event on Tuesday every 3 weeks), SA is the start date, and DA is the duration of an event (... which can be >24h to make things more interesting). It might continue recurring forever, or it might stop by some end date.

Now if you have two of these recurring events with periods A and B, you can think of each of them as a periodic function, and the combination of them will have a period equal to LCM(A, B). So we could check everything modulo LCM(A, B) once, and that would hold infinitely far forward.

But actually, we can do even better by taking the difference Δ=SA-SB mod GCD(A, B). You can sort of think of it as the closest approach between these two recurring events. If that's less than DA or more than the GCD-DB, these events are eventually going to overlap. There's some extra details to check whether overlap happens before either end date (if any), but that's the idea in a nutshell.

---

Where it gets really interesting is when you introduce timezones and daylight savings time (DST). Now a day is not always 24h, so there isn't a nice [SA+n*A, SA+n*A+DA) at regular intervals, and we can't naively work modulo the LCM or GCD anymore.

But we can notice that the days of the year on which a DST transition falls generally follow simple rules, and would repeat every 28 years (actually every 400 years due to leap years). In practice for a given timezone, for example CEST, there are only 14 unique days of the year on which the DST transitions can fall, and it repeats after a full period.

So if we want to check for overlap, we can treat those DST transition days as special cases, and all the time intervals in between DST transition days look locally "UTC-like" (time doesn't jump forward or backwards, if we ignore leap seconds), so the previous formula continues to work on all of these.

But at some point things become a little messy, so we did some simplifications =)

My napkin math suggests the size of the supercycle when you take two recurring events A and B with periods < 365 days and the 400 year cycle for DST transition can be about a hundred thousand years in the worst case.

But considering that timezones and daylight savings time are fundamentally political decisions, and that the IANA tzdata file is updated regularly, there is not much sense in applying the current rules thousands of years into the future! So we check at most 400 years ahead, and we consider that when timezones are involved, the overlap check is best effort and we prepare for the fact that it could miss a DST transition – no matter what math we do, a country could change the rules at any time, and a day that used to be 24h could now be 25, 22:30, or even skipped entirely, invalidating all past assumptions.

btown 13 hours ago||
Thank you so much for sharing this! That sounds like an equally fun and headache-inducing problem indeed!

The idea that someone could book a room for 400 years of consistency guarantees is somewhat hilarious, yet absolutely the kind of thing that would check a box in a regulated environment like healthcare!

It does speak to a larger issue I see quite often, which is that capturing structured intent as of the time an event is created, with the context that led to that decision, is incredibly hard to model. Because no matter what, something about the system will change, like daylight savings time or assumptions around resource/room availability, in an unpredictable way such that the data that had been used to drive a confident evaluation of lack-of-conflict at insertion time, is now insufficient to automatically resolve conflicts.

There's no single way to solve this problem - in a way, "reinterpreting intent" is why programming is often equal parts archaeology and anthropology. Certainly gives us very interesting problems to solve!

To circle it back to the original conversation: the better our databases are at handling large volumes of written data, the more context we can capture around insertion-time intent in an easy-to-query way, and the better we'll be at adapting to unforeseen circumstances!

amluto 2 days ago|||
I use SERIALIZABLE on a database for which I have very low writer concurrency (I can count the total writer connections on 1-2 hands) and where I really, really, really don’t want to deal with the fallout if a transaction commits and results in an impossible state.

I use MySQL, not Postgres, for this application (for better or for worse), and I can absolutely generate a bad state if I drop MySQL to a level below SERIALIZABLE — I’ve tried it. (Yes, I could probably avoid this with SELECT FOR UPDATE, but I don’t trust MySQL enough and I get adequate performance with SERIALIZABLE.)

To make SERIALIZABLE work well, I wrap all the transactions in retry loops, and I deal with MySQL’s obnoxious reporting of which errors are worthy of retries.

(Aside from bad committed states, I’ve also seen MySQL return results from a single straightforward query that cannot correspond to any state of the table. It was something like selecting MIN, MAX and COUNT(*) from a table and getting min and max differing and count = 1. It’s been a while — I could be remembering wrong.)

_kidlike 2 days ago||
been using select for update for many years now, in production. never had any issues. (MariaDB)
thayne 3 days ago||
Eh. If you care about performance enough to use OrioleDB you probably also want to avoid SERIALIZABLE.
samlambert 3 days ago||
This is not an open source license and it's untrue to say it's an open source project when it's licensed this way.

"IF ANY LITIGATION IS INSTITUTED AGAINST SUPABASE, INC. BY A LICENSEE OF THIS SOFTWARE, THEN THE LICENSE GRANTED TO SAID LICENSEE SHALL TERMINATE AS OF THE DATE SUCH LITIGATION IS FILED."

This is a poison pill. At best the licensing is naive and blocks even customers of Supabase from using OrioleDB, at worst it's an attempt for Supabase to provide themselves unlimited indemnity through the guise of a community project. It means the moment you sue Supabase for anything. Contract dispute, IP, employment, unrelated tort you lose the license. They could lose your data and if you try do anything about it they can immediately counter sue for a license violation. Using the brand of the postgres license to slip this in is pretty wild.

OrioleDB looks like a promising project and undoubtedly an great contribution from Supabase but it's not open source or really usable by anyone with this license.

kiwicopple 3 days ago||
sam, I think you know me well enough now to know that we're open source through and through. my mistake - I should have been more involved in this process with the team.

It is now Apache 2.0 which grants patent rights and can be re-licensed to PostgreSQL when the code is upstreamed. I'll amend the blog to make that clearer.

https://github.com/orioledb/orioledb/pull/558

jitl 3 days ago|||
I recall Facebook had a similar rider on the React license for many years until eventually removing it. It’s visually similar to the Apache2 patent clause but not scoped to just the licensed software use
seveibar 3 days ago||
Isn’t this just Apache 2-style permissive licensing?
samlambert 3 days ago||
lol no they both read as permissive on the surface. apache 2 doesn't include a termination clause that broadly protects an entity against any litigation. this is incredibly broad and not community safe.
boxed 3 days ago||
The graphs for OrioleDB looks very impressive. Can someone give a counter argument to switching to this?
wwizo 3 days ago||
Oreole is not a plug-and-play yet. From their docs ( https://www.orioledb.com/docs ) > OrioleDB currently requires a set of patches to PostgreSQL to enhance the pluggable storage API and other PostgreSQL subsystems. All of these patches have been submitted to the PostgreSQL community and are under review.
mattashii 2 days ago||
(Opinions are my own, not of my employer)

I think that "under review" claim is doing some very heavy lifting, especially when it relates to their changes to index tuple lifecycle management. The patches that have been submitted are unlikely to get committed in full anytime soon, even after substantial changes to the patches' designs.

PostgreSQL just has not been designed for what OrioleDB is doing, and forcing OrioleDB's designs into PostgreSQL upstream would a lot of (very) sharp edges that the community can't properly test without at least a baseline implementation - which critically hasn't been submitted to upstream. Examples of these sharp edges are varsized TIDs, MVCC-owning indexes, and table AM signalled index inserts.

There are certainly ideas in OrioleDB's designs that PostgreSQL can benefit from (retail index tuple deletion! self-clustering tables!), but these will need careful consideration in how this can be brought into the project without duplicating implementations at nearly every level. A wholesale graft of a downstream fork and then hoping it'll work out well enough is just not how the PostgreSQL project works.

ksec 2 days ago||
Oh this dampens my expectations, I was under the wrong impression that it will stabilise outside of Postgres before it is merged.
Sesse__ 3 days ago||
You get basically most of the advantages of a B-tree-oriented table, but also most of the disadvantages AFAIK. In particular, any index lookup/scan is going to take twice as long (index blocks don't point to the place on disk, they just contain the primary key and then you need to go lookup _that_ in the actual table).
akorotkov 3 days ago|||
This is generally true, but there are some additional aspects.

1. With PostgreSQL heap, you need to access the heap page itself. And it's not for free. It goes all through the buffer manager and other related components.

2. In OrioleDB, we have a lightweight protocol to read from pages. In-memory pages are connected using direct links (https://www.orioledb.com/docs/architecture/overview#dual-poi...), and pages are read lock-less (https://www.orioledb.com/docs/architecture/overview#page-str...). Additionally, tree navigation for simple data types skips both copying and tuple deforming (https://www.orioledb.com/blog/orioledb-fastpath-search).

According to all of the above, I believe OrioleDB still wins in the case of secondary key lookup. I think this is indirectly confirmed by the results of the TPC-C benchmark, which contains quite a lot of log of secondary key lookups. However, this subject is worth dedicated benchmarking in the future.

Sesse__ 3 days ago||
It would be interesting to see how OrioleDB does with more OLAP-like loads. From when I spent a lot of time benchmarking this, the indirect index design was _the_ main reason why MySQL+InnoDB was losing significantly to Postgres on TPC-H (well, DBT-3).[1] There was a lot of working around it with covering indexes etc..

Of course, the flip side of the coin is that if you do an UPDATE of a row in the presence of a secondary index, and the UPDATE doesn't touch the key, then you don't need to update the index(es) at all. So it really depends on how much you update rows versus how often you index-scan them IME.

[1] TPC-H doesn't have difficult enough queries to really stress the planner, so it mattered comparatively less there than in other OLAP work.

akorotkov 3 days ago||
Thank you, that would be on the TODO list.
jitl 3 days ago|||
That’s how regular Postgres b-tree indexes work too.
Sesse__ 3 days ago||
I'll take a [citation needed] on that one.
jitl 3 days ago|||
https://www.postgresql.org/docs/current/indexes-index-only-s...

This is why Postgres b-tree indexes offer CREATE INDEX (indexCol1, indexCol2, ...) INCLUDE (includeCol1, includeCol2, ...). With INCLUDE, the index will directly store the listed additional columns, so if your query does `SELECT includeCol1 WHERE indexCol1 = X AND indexCol2 > Y`, you avoid needing to look up the entire row in the heap, because includeCol1 is stored in the index already. This is called a "covering index" because the index itself covers all the data necessary to answer the query, and you get an "index only scan" in your query plan.

The downside to creating covering indexes is that it's more work for Postgres to go update all the INCLUDE values in all your covering indexes at write time, so you are trading write speed for increased read speed.

I think it's quite typical to see this in SQL databases. SQLite behaves the same way for indexes; the exception is that if you create a WITHOUT ROWID table, then the table itself is sorted by primary key instead of by ROWID, so you get at most 1 index that maps directly to the row value. (sqlite docs: https://sqlite.org/withoutrowid.html)

Sesse__ 3 days ago||
That link directly contradicts what you are saying.

> This means that in an ordinary index scan, each row retrieval requires fetching data from both the index and the heap.

Note that it says _index and the heap_. Not _index and the primary index and the heap_. (For a B-tree-organized table, the leaf heap nodes are essentially the bottom of the primary index, so it means that to find anything, you need to follow the primary index from the top, which may or may not entail extra disk accesses. For a normal Postgres heap, this does not happen, you can just go directly to the right block.)

Index-only scans (and by extension, INCLUDE) are to avoid reaching into the heap at all.

> The downside to creating covering indexes is that it's more work for Postgres to go update all the INCLUDE values in all your covering indexes at write time, so you are trading write speed for increased read speed.

For updates, even those that don't touch INCLUDE values, Postgres generally needs to go update the index anyway (this the main weakness of such a scheme). HOT is an exception, where you can avoid that update if there's room in the same heap block, and the index scans will follow the marker(s) you left to “here's the new row!” instead of fetching it directly.

jitl 2 days ago||
Ah gotcha. My understanding of the heap reference was that heap lookups involve traversing a heap-like tree data structure, not a direct link to a specific leaf page.

Based on the limited description of OrioleDB I understand it works like SQLite WITHOUT ROWID, actually storing the row tuple in the primary key b-tree, but I didn’t go read the code

Sesse__ 1 day ago||
It's confusing nomenclature, but Postgres' “heap” is not a heap (B-tree) data structure. It is just a bunch of blocks all on the same level, similar to how the “heap” in an operating system (as opposed to the stack) is.

Notably, you can have a Postgres table without a primary key at all, not even an implicit one.

> Based on the limited description of OrioleDB I understand it works like SQLite WITHOUT ROWID, actually storing the row tuple in the primary key b-tree, but I didn’t go read the code

This is my understanding of OrioleDB as well.

akorotkov 3 days ago|||
Yep, regular PostgreSQL indexes point to a heap location (block number + offset). And it is the same for primary and secondary indexes.
victorbjorklund 3 days ago||
Non go when it includes a poison pill.
kiwicopple 3 days ago|
https://news.ycombinator.com/item?id=45200014
gallypette 3 days ago||
It is time to realize that open source drives innovation.
Luker88 2 days ago||
Legal shield? So once again people have to do manual legal work for something that can be taken for granted in any (L)GPL* software.

We should start counting the times a permissive-licensed software needs this kind of protection, and then wonder what is the difference between all this effort and just going GPL.

You don't need to hang onto hopes and public shaming until someone else does the hard part.

Still, kudos for the grant.

8cvor6j844qw_d6 3 days ago||
Is OrioleDB just PostgreSQL but with some underlying modifications for cloud environments?

How does it compare with Neon DB?

916c0553e164269 3 days ago||
"The differences between OrioleDB and Neon" ( June 20, 2025 )

https://www.orioledb.com/blog/orioledb-neon-differences

boxed 3 days ago|||
The "cloud environments" part sounds like marketing fluff. "The cloud" is just someone else's servers after all. There's nothing special about it.
IgorPartola 3 days ago|||
That’s like saying a chair is just a tree that has been modified. Technically true, practically there are some very specific differences.
throwaway894345 3 days ago||
What are the relevant differences? I’m not as cynical as the parent commenter, but I’m also unclear about what OrioleDB is doing that is meaningfully “CloudNative”. From skimming the main page, it seems like it’s just doing storage differently, but so far I’ve seen nothing to suggest that difference is “leveraging cloud services” or anything else.
IgorPartola 3 days ago||
I am not familiar with this particular product but generally if you run on say AWS you either need to account for the greatly increased disk latency due to EBS being network storage or build provisions for local storage that is not necessarily unlimited, it is unclear what kind of disk controller it is attached to, etc. It could also mean optimizing for the AWS-specific CPU architecture. Or it could mean using S3 as storage which has yet different durability and consistency semantics compared to other storage systems. It might also mean optimizing for pricing of a given cloud provider in some way.
throwaway894345 3 days ago||
I agree that all of those things could be true, but I haven't read anything that indicates any special dependency on or knowledge of proprietary cloud systems. As far as I can tell, it's just going to use whatever disk/CPU you give it?
benjiro 2 days ago||||
> There's nothing special about it

Wait when you need to manage a bunch of servers yourself. Unfortunately, the solutions available are complex, and not something where you can simply points something to a server, or VPS, and have a quick total controlled kernel level solution. K8, sure, but even that is not on the same level. And when you then need to manage DB's, your often placing those outside K8. Most scalable solutions like CRDB (10m pay, required yearly "free" license approval), Yugabyte(half broken), TiDB ... have their own issues and again, do not tie into a complete managed cloud experience.

pbronez 3 days ago||||
In this case, it seems to refer to their support for S3-compatible object storage as for persistence.
grandfugue 3 days ago|||
If you take a look at how storage is billed in cloud, you'll see a huge difference. Networked storage, e.g., EBS, provides durability and survives VM restart. But it is billed on IOPS. 200K IOPS is a piece of cake for today's NVMe. But a 200K EBS easily costs you thousands per month. High-end NVMe devices, unfortunately, are all instance-level storage, which means they are gone if you shutdown your VM.
LtdJorge 3 days ago||
It’s a different storage engine for Postgres
exabrial 2 days ago||
So if someone wanted to implement this in MySql, what would the legal ramifications be?
exabrial 2 days ago|
Well, nm, pretty clear path now:

https://github.com/orioledb/orioledb/commit/44bab2aa9879feb7...

monadoid 3 days ago||
Love you supabase
awaseem 3 days ago|
Honestly so amazing! Supabase doing great work as always
More comments...