Top
Best
New

Posted by lifty 6/28/2025

People Keep Inventing Prolly Trees(www.dolthub.com)
191 points | 44 commentspage 2
zombot 7/1/2025||
What the hell does "probabilistically balanced" mean?
timsehn 7/2/2025||
We actually wrote another blog about this because I had the same question. I am the CEO of DoltHub so I can have my engineers write stuff to explain it to me :-)

https://www.dolthub.com/blog/2025-06-26-prolly-tree-balance/

moomin 7/1/2025|||
It means the balancing is content-dependent, and is normally balanced, but certain edge-case inputs may result in sub-optimal behaviour.
zombot 7/1/2025||
What is probabilistic about that? It sounds deterministic.
judofyr 7/1/2025|||
The opposite of probabilistic is not deterministic in this context. This is not about «drawing a random number», but rather that balancing is dependent on the input data. «With high probability» here means «majority of the possible input data leads to a balanced structure».

If it was not probabilistic then the balancing would be guaranteed in all cases. This typically means that it somehow stores balancing information somewhere so that it can detect when something is unbalanced and repair it. In this data structure we’re just hashing the content without really caring about the current balance and then it turns out that for most inputs it will be fine.

mcherm 7/1/2025||||
The specific behavior of the hash function (including the salt you chose). Choosing a different hash function (or a different salt) would result in a different breakdown into chunks.

In principle, if your data were specially crafted to exploit the specific hash function (and salt) you could get an aberrant case like 1 million entries in a single b-tree node or a million b-tree nodes with just one entry. But unless you intentionally exploit the hash function the chance of this is vanishingly small.

stonemetal12 7/1/2025||||
Like quicksort is O(N log N) on average but can degrade to O(N^2) in the worse case. The tree is balanced on average, but can degrade to not close to balanced in the worst case.
moomin 7/1/2025|||
It's a term of art. We say the same thing about quicksort being "usually" O(n log n)
guywithahat 7/1/2025|
One things an exec said at my old job I liked was “research is generally ~5-10 years behind industry”, and I see that seems to still be the case for prolly trees