Posted by charleshn 4 days ago
This made sense for product catalogs, employee dept and e-commerce type of use cases.
But it's an extremely poor fit for storing a world model that LLMs are building in an opaque and probabilistic way.
Prediction: a new data model will take over in the next 5 years. It might use some principles from many decades of relational DBs, but will also be different in fundamental ways.
I was familiar with Solarflare and Mellanox zero copy setups in a previous fintech role, but at that time it all relied on black boxes (specifically out of tree kernel modules, delivered as blobs without DKMS or equivalent support, a real headache to live with) that didn't always work perfectly, it was pretty frustrating overall because the customer paying the bill (rightfully) had less than zero tolerance for performance fluctuations. And fluctuations were annoyingly common, despite my best efforts (dedicating a core to IRQ handling, bringing up the kernel masked to another core, then pinning the user space workloads to specific cores and stuff like that) It was quite an extreme setup, GPS disciplined oscillator with millimetre perfect antenna wiring for the NTP setup etc we built two identical setups one in Hong Kong and one in new york. Ah very good fun overall but frustrating because of stack immaturity at that time.
It turns out that btrees are still efficient for this work. At least until the hardware vendors deign to give us an interface to SSD that looks more like RAM.
Reading over https://www.cs.cit.tum.de/dis/research/leanstore/ and associated papers and follow up work is recommended.
In the meantime with RAM prices sky rocketing, work and research in buffer & page management for greater-than-main-memory-sized DBs is set to be Hot Stuff again.
I like working in this area.
The amount of performance you can extract from a modern CPU if you really start optimising cache access patterns is astounding
High performance networking is another area like this. High performance NICs still go to great lengths to provide a BSD socket experience to devs. You can still get 80-90% of the performance advantages of kernel bypass without abandoning that model.
I think this was one, and I want to emphasise this, of the main points behind Odin programming language.