Posted by whispem 4 days ago
I'm Emilie, I have a literature background (which explains the well-written documentation!) and I've been learning Rust and distributed systems by building minikv over the past few months. It recently got featured in Programmez! magazine: https://www.programmez.com/actualites/minikv-un-key-value-st...
minikv is an open-source, distributed storage engine built for learning, experimentation, and self-hosted setups. It combines a strongly-consistent key-value database (Raft), S3-compatible object storage, and basic multi-tenancy.
Features/highlights:
- Raft consensus with automatic failover and sharding - S3-compatible HTTP API (plus REST/gRPC APIs) - Pluggable storage backends: in-memory, RocksDB, Sled - Multi-tenant: per-tenant namespaces, role-based access, quotas, and audit - Metrics (Prometheus), TLS, JWT-based API keys - Easy to deploy (single binary, works with Docker/Kubernetes)
Quick demo (single node):
```bash git clone https://github.com/whispem/minikv.git cd minikv cargo run --release -- --config config.example.toml curl localhost:8080/health/ready
# S3 upload + read curl -X PUT localhost:8080/s3/mybucket/hello -d "hi HN" curl localhost:8080/s3/mybucket/hello
Docs, cluster setup, and architecture details are in the repo. I’d love to hear feedback, questions, ideas, or your stories running distributed infra in Rust!
Repo: https://github.com/whispem/minikv Crate: https://crates.io/crates/minikv
Minio used to do that but changed many years ago. Production-grade systems don't do that, for good reason. The only tool I've found is Rclone but it's not really meant to be exposed as a service.
Anyone knows of an option?
If you need “one file per object” for a specific workflow, it’s possible to add a custom backend or tweak volume logic — but as you noted, most production systems move away from that model for robustness. That said, minikv’s flexible storage API makes experimentation possible if that’s what your use-case demands and you’re fine with the trade-offs.
Let me know what your usage scenario is, and I can advise on config or feature options!
Few distributed filesystems/object stores seem to use Raft (or consensus at all) for replicating data because it's unnecessary overhead. Chain replication is one popular way for replicating data (which uses consensus to manage membership but the data path is outside of consensus).
- Raft is used for intra-shard strong consistency: within each "virtual shard" (256 in total), data and metadata are replicated via Raft (with leader election and log replication), not just for cluster membership;
- 2PC (Two-Phase Commit) is only used when a transaction spans multiple shards: this allows atomic, distributed writes across multiple partitions. Raft alone is not enough for atomicity here, hence the 2PC overlay;
- The design aims to illustrate real-world distributed transaction tradeoffs, not just basic data replication. It helps understand what you gain and lose with a layered model versus simpler replication like chain replication (which, as you noted, is more common for the data path in some object stores).
So yes, in a pure object store, consensus for data replication is often skipped in favor of lighter-weight methods. Here, the explicit Raft+2PC combo is an architectural choice for anyone learning, experimenting, or wanting strong, multi-shard atomicity. In a production system focused only on throughput or simple durability, some of this could absolutely be streamlined.
- Documentation: https://garagehq.deuxfleurs.fr/
- Git repo: https://git.deuxfleurs.fr/Deuxfleurs/garage
- Raft + 2PC together, as above, so people can see how distributed consensus and cross-shard atomicity actually operate and interplay (with their trade-offs);
- Several subsystems are written for readability and transparency (clean error propagation, explicit structures) even if that means a few more allocations or some lost microseconds;
- The storage layer offers different backends (RocksDB, Sled, in-memory) to let users experiment and understand their behavior, not because it’s always ideal to support so many;
- Features such as CDC (Change Data Capture), admin metrics, WAL status, and even “over-promiscuous” logs are exposed for teaching/tracing/debugging, though those might be reduced or hardened in production;
- Much of the CLI/admin API exposes “how the sausage is made,” which is gold for learning but might be hidden in a SaaS-like setting;
So yes, if I targeted only hyperscale production, some internals would be simplified or streamlined, but the educational and transparency value is central to this project’s DNA.
I plan to push an official image (and perhaps an OCI image with scratch base) as the project matures — open to suggestions on ideal platforms/formats.
What is the memory consumption under a significant load? That seems to be as much important as the throughput & latency.
- With the in-memory backend: Every value lives in RAM (with HashMap index, WAL ring buffer, TTL map, and Bloom filters). For a cluster with a few million objects, you’ll typically see a node use as little as 50–200 MB, scaling up with active dataset size and batch inflight writes;
- With RocksDB or Sled: Persistent storage keeps RAM use lower for huge sets but still caches hot keys/metadata and maintains Bloom + index snapshots (both configurable). The minimum stays light, but DB block cache, WAL write buffering, and active transaction state all add some baseline RAM (tens to a few hundreds of MB/node in practice);
- Heavy load (many concurrent clients, transactions, or CDC enabled): Buffers, Raft logs, and transaction queues scale up, but you can cap these in config (batch size, CDC buffer, WAL fsync policy, etc);
- Prometheus /metrics and admin API expose live stats, so you can observe resource use per node in production.
If you have a specific workload or dataset in mind, feel free to share it and I can benchmark or provide more precise figures!
Why people always lie with this? Especially in this case that they uploaded the entire log:
Date: Sat Dec 6 16:08:04 2025 +0100
Add hashing utilities and consistent hash ring
Date: Sat Dec 6 16:07:24 2025 +0100
Create mod.rs for common utilities in minikv
Date: Sat Dec 6 16:07:03 2025 +0100
Add configuration structures for minikv components
Date: Sat Dec 6 16:06:26 2025 +0100
Add error types and conversion methods for minikv
Date: Sat Dec 6 16:05:45 2025 +0100
Add main module for minikv key-value store
And this goes on until project is complete (which probably took 2~3h total if sum all sessions). Doubt learned anything at all. Well, other than that LLMs can solo complete simple projects.Comments in previous submission are also obviously AI generated. No wonder was flagged.
>Built in public as a learning-by-doing project
So, either the entire project was already written and being uploaded one file at the time (first modification since lowest commit mentioned is README update: https://github.com/whispem/minikv/commit/6fa48be1187f596dde8..., clearly AI generated and clearly AI used has codebase/architecture knowledge), and this claim is false, or they're implementing a new component every 30s.
https://github.com/whispem/minikv/commit/6e01d29365f345283ec...
Rapid CI is essential for catching bugs early, allowing fast iteration and a healthy contribution workflow. I sometimes use small, continuous commits (“commit, push, fix, repeat”) during intense development or when onboarding new features, and the fast CI loop helps maintain momentum and confidence in code quality.
If you’re curious about the setup, it’s all described in LEARNING.md and visible in the repo’s .github/workflows/ scripts!