Show HN: Minikv – Distributed key-value and object store in Rust (Raft, S3 API)

Posted by whispem 4 days ago

Show HN: Minikv – Distributed key-value and object store in Rust (Raft, S3 API)(github.com)

Hi HN,

I'm Emilie, I have a literature background (which explains the well-written documentation!) and I've been learning Rust and distributed systems by building minikv over the past few months. It recently got featured in Programmez! magazine: https://www.programmez.com/actualites/minikv-un-key-value-st...

minikv is an open-source, distributed storage engine built for learning, experimentation, and self-hosted setups. It combines a strongly-consistent key-value database (Raft), S3-compatible object storage, and basic multi-tenancy.

Features/highlights:

- Raft consensus with automatic failover and sharding - S3-compatible HTTP API (plus REST/gRPC APIs) - Pluggable storage backends: in-memory, RocksDB, Sled - Multi-tenant: per-tenant namespaces, role-based access, quotas, and audit - Metrics (Prometheus), TLS, JWT-based API keys - Easy to deploy (single binary, works with Docker/Kubernetes)

Quick demo (single node):

```bash git clone https://github.com/whispem/minikv.git cd minikv cargo run --release -- --config config.example.toml curl localhost:8080/health/ready

# S3 upload + read curl -X PUT localhost:8080/s3/mybucket/hello -d "hi HN" curl localhost:8080/s3/mybucket/hello

Docs, cluster setup, and architecture details are in the repo. I’d love to hear feedback, questions, ideas, or your stories running distributed infra in Rust!

Repo: https://github.com/whispem/minikv Crate: https://crates.io/crates/minikv

71 points | 36 comments

remram 3 days ago|

Bit of a tangent, but what I'm looking for is a S3-compatible server with transparent storage, ie storing each file (object) as an individual file on disk.

Minio used to do that but changed many years ago. Production-grade systems don't do that, for good reason. The only tool I've found is Rclone but it's not really meant to be exposed as a service.

Anyone knows of an option?

whispem 3 days ago||

minikv actually supports a fully S3-compatible API (PUT/GET/BATCH, including TTL extensions and real-time notifications). By default, the storage engine is segmented/append-only with object records in blob files, not “one file per object”. However, you can configure a backend (like the in-memory mode for dev/test, or Sled/RocksDB) and get predictable, transparent storage behavior for objects. Storing each object as an individual file isn’t the default — for durability and atomics, objects are grouped inside segment files to enable fast compaction, consistent snapshots, and better I/O performance.

If you need “one file per object” for a specific workflow, it’s possible to add a custom backend or tweak volume logic — but as you noted, most production systems move away from that model for robustness. That said, minikv’s flexible storage API makes experimentation possible if that’s what your use-case demands and you’re fine with the trade-offs.

Let me know what your usage scenario is, and I can advise on config or feature options!

unmole 3 days ago|||

s3proxy: https://github.com/gaul/s3proxy

efemeryl 3 days ago||

[dead]

eatonphil 3 days ago||

Great educational project! I'm curious why you are using Raft and also 2PC unless you're sharding data and doing cross-shard transactions? Or is Raft only for cluster membership but 2PC is for replicating data? If that's the case it kind of seems like overkill but I'm not sure.

Few distributed filesystems/object stores seem to use Raft (or consensus at all) for replicating data because it's unnecessary overhead. Chain replication is one popular way for replicating data (which uses consensus to manage membership but the data path is outside of consensus).

whispem 3 days ago|

Thank you for this sharp and detailed question! In minikv, both Raft and 2PC are purposefully implemented, which may seem “overkill” in some contexts, but it serves both education and production-grade guarantees:

- Raft is used for intra-shard strong consistency: within each "virtual shard" (256 in total), data and metadata are replicated via Raft (with leader election and log replication), not just for cluster membership;

- 2PC (Two-Phase Commit) is only used when a transaction spans multiple shards: this allows atomic, distributed writes across multiple partitions. Raft alone is not enough for atomicity here, hence the 2PC overlay;

- The design aims to illustrate real-world distributed transaction tradeoffs, not just basic data replication. It helps understand what you gain and lose with a layered model versus simpler replication like chain replication (which, as you noted, is more common for the data path in some object stores).

So yes, in a pure object store, consensus for data replication is often skipped in favor of lighter-weight methods. Here, the explicit Raft+2PC combo is an architectural choice for anyone learning, experimenting, or wanting strong, multi-shard atomicity. In a production system focused only on throughput or simple durability, some of this could absolutely be streamlined.

LunaSea 3 days ago||

Hello, cool project, did you think about maybe contributing to the key-value store feature of Garage, which is also a Rust project by open source development lab Deux Fleurs?

whispem 3 days ago|

Hello! Thank you for your message. I don’t know this project, do you have a GitHub link maybe?

LunaSea 3 days ago||

Sure, here you go:

- Documentation: https://garagehq.deuxfleurs.fr/

- Git repo: https://git.deuxfleurs.fr/Deuxfleurs/garage

whispem 3 days ago||

Thanks!

iryna_kondr 3 days ago||

Hi Emilie, nice project, thanks for sharing. I’m curious whether there were any decisions that you added mainly for educational value even though you wouldn’t make the same call in a production system?

whispem 3 days ago|

Thanks for the feedback and for the question! A number of choices in minikv are explicitly made to explain distributed system ideas clearly, even if not always optimal for hyperscale prod environments:

- Raft + 2PC together, as above, so people can see how distributed consensus and cross-shard atomicity actually operate and interplay (with their trade-offs);

- Several subsystems are written for readability and transparency (clean error propagation, explicit structures) even if that means a few more allocations or some lost microseconds;

- The storage layer offers different backends (RocksDB, Sled, in-memory) to let users experiment and understand their behavior, not because it’s always ideal to support so many;

- Features such as CDC (Change Data Capture), admin metrics, WAL status, and even “over-promiscuous” logs are exposed for teaching/tracing/debugging, though those might be reduced or hardened in production;

- Much of the CLI/admin API exposes “how the sausage is made,” which is gold for learning but might be hidden in a SaaS-like setting;

So yes, if I targeted only hyperscale production, some internals would be simplified or streamlined, but the educational and transparency value is central to this project’s DNA.

_s_a_m_ 3 days ago||

I there an official docker image? I am looking for something more light-weighted than MinIO. What are the requirements?

whispem 3 days ago||

There’s not an “official” image on Docker Hub yet, but the repo ships with a ready-to-use Dockerfile and a Compose cluster example. You can build with docker build . and spin up multi-node clusters trivially. Static Rust binaries make the image compact (typically ≤30MB zipped; nothing compared to MinIO :)), with no heavy runtimes. Requirements are dead simple: a recent Docker engine, any x86_64 (or ARM) host, and a few tens of MB RAM per instance at low load, scaling with data size/traffic.

I plan to push an official image (and perhaps an OCI image with scratch base) as the project matures — open to suggestions on ideal platforms/formats.

flakron 3 days ago||

Have you checked garage - https://garagehq.deuxfleurs.fr ? Not affiliated nor trying to overshadow the posted project

whispem 3 days ago||

Yes! I'll check as soon as possible

kunley 3 days ago||

Looks nice.

What is the memory consumption under a significant load? That seems to be as much important as the throughput & latency.

whispem 3 days ago|

Very relevant question! The memory profile in minikv depends on usage scenario and storage backend.

- With the in-memory backend: Every value lives in RAM (with HashMap index, WAL ring buffer, TTL map, and Bloom filters). For a cluster with a few million objects, you’ll typically see a node use as little as 50–200 MB, scaling up with active dataset size and batch inflight writes;

- With RocksDB or Sled: Persistent storage keeps RAM use lower for huge sets but still caches hot keys/metadata and maintains Bloom + index snapshots (both configurable). The minimum stays light, but DB block cache, WAL write buffering, and active transaction state all add some baseline RAM (tens to a few hundreds of MB/node in practice);

- Heavy load (many concurrent clients, transactions, or CDC enabled): Buffers, Raft logs, and transaction queues scale up, but you can cap these in config (batch size, CDC buffer, WAL fsync policy, etc);

- Prometheus /metrics and admin API expose live stats, so you can observe resource use per node in production.

If you have a specific workload or dataset in mind, feel free to share it and I can benchmark or provide more precise figures!

frwickst 3 days ago|

Last posted 16 days ago: https://news.ycombinator.com/item?id=46661308

whispem 3 days ago||

Yes, I know. I had the opportunity to request a review of my first post (which was flagged) following my email to the moderators of HN. After checking, the moderator told me to redo a post because indeed I was wrongly flagged by some people here.

forgotpwd16 3 days ago||

>All the code, architecture, logic, and design in minikv were written by me, 100% by hand.

Why people always lie with this? Especially in this case that they uploaded the entire log:

  Date:   Sat Dec 6 16:08:04 2025 +0100
      Add hashing utilities and consistent hash ring
  Date:   Sat Dec 6 16:07:24 2025 +0100
      Create mod.rs for common utilities in minikv
  Date:   Sat Dec 6 16:07:03 2025 +0100
      Add configuration structures for minikv components
  Date:   Sat Dec 6 16:06:26 2025 +0100
      Add error types and conversion methods for minikv
  Date:   Sat Dec 6 16:05:45 2025 +0100
      Add main module for minikv key-value store

And this goes on until project is complete (which probably took 2~3h total if sum all sessions). Doubt learned anything at all. Well, other than that LLMs can solo complete simple projects.

Comments in previous submission are also obviously AI generated. No wonder was flagged.

yes_man 3 days ago|||

You have never split your working tree changes into separate commits?

forgotpwd16 3 days ago|||

Irrelevant question. In README has:

>Built in public as a learning-by-doing project

So, either the entire project was already written and being uploaded one file at the time (first modification since lowest commit mentioned is README update: https://github.com/whispem/minikv/commit/6fa48be1187f596dde8..., clearly AI generated and clearly AI used has codebase/architecture knowledge), and this claim is false, or they're implementing a new component every 30s.

whispem 3 days ago|||

I had the opportunity to request a review of my first post (which was flagged) following my email to the moderators of HN. I didn’t use AI for the codebase, only for .md files & there's no problem with that. My project was reviewed by moderators, don't worry. If the codebase or architecture was AI generated this post would not have been authorized and therefore it would not have been published.

skylurk 3 days ago|||

How does this deleted fix_everything.sh fit in to your story?

https://github.com/whispem/minikv/commit/6e01d29365f345283ec...

whispem 3 days ago||

I don't see the problem to be honest

skylurk 3 days ago||

Hmm. You doth protest too much, methinks :)

rollulus 3 days ago|||

I thought that your “background in literature” contributed to the “well-written docs”, but that was LLMs!

whispem 3 days ago||

No, I was helped (.md files only) by AI to rewrite but the majority of the doc is written by myself, I just asked for help from the AI for formatting for example.

johnbellone 3 days ago|||

I am not going to pretend to know what this person did, but I've definitely modified many things at once and made distinct commits after the fact (within 30s). I do not find it that abnormal.

whispem 3 days ago||

Thanks a lot! I make distinct commits "every 30s" because I'm focused and I test my project. If the CI is green, I don't touch of anything. If not, I work on the project until the CI is fully green.

jabron 3 days ago||

What does that mean? You got feedback from the CI within 30 seconds and immediately pushed a fix?

whispem 3 days ago||

Yes, in minikv, I set up GitHub Actions for automated CI. Every push or PR triggers tests, lint, and various integration checks — with a typical runtime of 20–60 seconds for the core suite (thanks to Rust’s speed and caching). This means that after a commit, I get feedback almost instantly: if a job fails, I see the logs and errors within half a minute, and if there’s a fix needed, I can push a change right away.

Rapid CI is essential for catching bugs early, allowing fast iteration and a healthy contribution workflow. I sometimes use small, continuous commits (“commit, push, fix, repeat”) during intense development or when onboarding new features, and the fast CI loop helps maintain momentum and confidence in code quality.

If you’re curious about the setup, it’s all described in LEARNING.md and visible in the repo’s .github/workflows/ scripts!

jabron 3 days ago||

So you read the CI result, implement a fix and stage + commit your changes in ~10 seconds? You might be superhuman.

whispem 3 days ago|||

Yes, I do split my working tree into separate commits whenever possible! I use interactive staging (git add -p) to split logical chunks: features, fixes, cleanups, and documentation are committed separately for clarity. Early in the project (lots of exploratory commits), some changes were more monolithic, but as minikv matured, I've prioritized clean commit history to make code review and future changes easier. Always happy to get workflow tips — I want the repo to be easy to follow for contributors!

kryptiskt 3 days ago|||

It looks like that if you want logically separated commits from a chunk of programming you have done. Stage a file or a hunk or two, write commit message, commit, rinse and repeat.

whispem 3 days ago||

Absolutely: for all meaningful work I prefer small, logical commits using git add -p or similar, both for history clarity and for reviewer sanity. In initial “spike” or hack sessions (see early commits :)), it’s sometimes more monolithic, but as the codebase stabilized I refactored to have tidy, atomic commit granularity. I welcome suggestions on workflow or PR polish!