Posted by pranabsarkar 7 hours ago
YantrikDB is a cognitive memory engine — embed it, run it as a server, or connect via MCP. It thinks about what it stores: consolidation collapses duplicate memories, contradiction detection flags incompatible facts, temporal decay with configurable half-life lets unimportant memories fade like human memory does.
Single Rust binary. HTTP + binary wire protocol. 2-voter + 1-witness HA cluster via Docker Compose or Kubernetes. Chaos-tested failover, runtime deadlock detection (parking_lot), per-tenant quotas, Prometheus metrics. Ran a 42-task hardening sprint last week — 1178 core tests, cargo-fuzz targets, CRDT property tests, 5 ops runbooks.
Live on a 3-node Proxmox homelab cluster with multiple tenants. Alpha — primary user is me, looking for the second one.
I tried to write the consolidation/conflict-detection logic on top of ChromaDB. It didn't work — the operations need to be transactional with the vector index, and they need an HLC for ordering across nodes. So I built it as a database.
The cognitive operations (think, consolidate, detect_conflicts, derive_personality) are the actual differentiator. The clustered server is what made me confident enough to ship — I needed to know the data was safe before I'd put real work on it.
What I genuinely want to know: is this solving a problem you're hitting with your AI agent's memory, or did I build a really polished thing for my own narrow use case? Honest reactions help more than encouragement.
I'm in the middle of building an agent harness and I haven't had to deal with long-running memory issues yet, but I will have to deal with it soon.
I'm incredibly interested in this as a product, but I think it makes too many assumptions about how to prune information. Sure, this looks amazing on an extremely simple facts, but most information is not reducible to simple facts.
"CEO is Alice" and "CEO is Bob" may or may not actually be contradictions and you simply cannot tell without understanding the broader context. How does your system account for that context?
Example: Alice and Bob can both be CEO in any of these cases:
* The company has two CEOs. Rare and would likely be called "co-CEO"
* The company has sub-organizations with CEOs. Matt Garman is the CEO of AWS. Andy Jassy is the CEO of Amazon. Amazon has multiple people named "CEO".
* Alice and Bob are CEOs of different companies (perhaps, this is only implicit)
* Alice is the current CEO. Bob is the previous CEO. Both statements are temporally true.
This is what I run into every time I try to do conflict detection and resolution. Pruning things down to facts doesn't provide sufficient context understand how/why that statement was made?
Graph edges carry scope. Alice ceo_of Acme and Andy ceo_of Amazon are two edges with different src/dst — conflict scanner looks for (src, rel_type) → ≥2 dsts, so Garman/Jassy don't false-flag if edges are modeled. Gap: most agents just write raw sentences and never call relate().
Temporal decay handles "previous vs current" weakly. half_life × importance attenuates old memories. But that's fade, not logical supersession — the DB doesn't know time-of-validity, only time-of-writing.
Namespaces segregate scope when the agent uses them. Leans on the agent.
Honest result from a bench I ran today (same HN thread): seeded 6 genuine contradictions in 59 memories, think() flagged 60. ~54 are noise-or-ambiguous exactly in the ways you listed. Filed as issue #3.
Design stance: contradictions are surfaced, not resolved. yantrikdb_conflicts returns a review queue; the agent has conversation context, the DB doesn't. "These two may be in tension" not "these are contradictory." That doesn't fix your point — it admits the DB can't make that call alone. Co-CEOs, subsidiaries, temporal supersession need typed-relations + time-of-validity schema work. That's v0.6, not v0.5.
Top-quality AI slop. I hate this.
To the author: project aside, it's not a good look to let an LLM drive your HN profile.
The failure modes were multiple: - Facts rarely exist in a vacuum but have lots of subtlety - Inferring facts from conversation has a gazillion failure modes, especially irony and sarcasm lead to hilarious outcomes (joking about a sixpack with a fat buddy -> "XYZ is interested in achieving an athletic form"), but even things as simple as extracting a concrete date too often go wrong - Facts are almost never as binary as they seem. "ABC has the flights booked for the Paris trip". Now I decided afterwards to continue to New York to visit a friend instead of going home and completely stumped the agent.
Two small clarifications:
remember(text, importance, domain) takes a free-form string — nothing forces atomic facts. A QMD-style prose block, a procedure, a dated plan, all work. The irony/sarcasm-inverts-the-fact failure mode lives in the agent's extraction layer, not the backend. So "write narrative into it, recall narrative out" is a legitimate usage pattern; the DB is agnostic.
YantrikDB's actual differentiator vs mem0 is temporal decay + consolidation + conflict detection, not smarter fact extraction. The "ABC has the Paris flight booked → actually I'm going to NYC" problem is meant to be addressed by decay (the old fact fades) and contradiction flagging (the new one triggers a conflict for the agent to resolve). But — honest read — my bench today showed conflict detection needs work to actually fire on raw text. Filed as issues #1 and #2, fixing now.
Broader point stands though: if the agent is producing brittle inferred facts upstream, no memory backend saves it. The DB can manage rot and contradiction. It can't fix bad inference. For what it's worth, I mostly use it for durable role context ("user is a data scientist on observability") rather than event lifecycle ("Paris flight booked") — the latter is what prose summarization is genuinely better at, and I think you're right that mem0-style auto-extraction applied to lifecycle events is a bad shape.
The fundamental breakthrough with LLMs is that they handle semantic mapping for you and can (albeit non-deterministically) interpret the meaning and relationships between concepts with a pretty high degree of accuracy, in context.
It just makes me wonder if you could dramatically simplify the schema and data modeling by incorporating more of these learnings.
I have a simple experiment along these lines that’s especially relevant given the advent of one-million-token context windows, although I don’t consider it a scientifically backed or production-ready concept, just an exploration: https://github.com/tcdent/wvf
My counter, qualified: deterministic consolidation is cheap and reproducible in a way LLM-in-the-loop consolidation isn't, at least today. Every think() invocation is free (cosine + entity matching + SQL). If I put an LLM in the loop the cost is O(N²) LLM calls per consolidation pass — for a 10k-memory database, that's thousands of dollars of inference per tick. So for v1 I'm trading off "better merge decisions" against "actually runs every 5 minutes without burning a budget."
On 1M-context-windows: I think they push the "vector DB break point" out but don't remove it. Context stuffing still has recall-precision problems at scale (lost-in-the-middle, attention dilution on unrelated facts), and 1M tokens ≠ unbounded memory. At 10M memories no context window saves you.
wvf is interesting — just read through. The "append everything, let the model retrieve" approach is the complement of what I'm doing: you lean fully into LLM semantics, I try to do the lookup deterministically. Probably both are right for different workloads. Yours wins when you have unbounded compute + a small corpus; mine wins when you have bounded compute + a large corpus that needs grooming.
Starring wvf now. Curious if you're seeing meaningful quality differences between your approach and traditional retrieval at scale.
Absolutely agree the deterministic performance-oriented mindset is still essential for large workloads. Are you expecting that this supplements a traditional vector/semantic store or that it superceeds it?
My focus has absolutely been on relatively small corpii, and which is supported by forcing a subset of data to be included by design. There are intentionally no conventions for things like "we talked about how AI is transforming computing at 1AM" and instead it attempts to focus on "user believes AI is transforming computing", so hopefully there's less of the context poisoning that happens with current memory.
Haven't deployed WVF at any scale yet; just a casual experiment among many others.
What I do have right now:
1178 core unit tests including CRDT convergence property tests via proptest (for any sequence of ops, final state is order-independent) Chaos test harness: Docker'd 3-node cluster with leader-kill / network-partition / kill-9 scenarios (tests/chaos/ in the repo) cargo-fuzz targets against the wire protocol and oplog deserializer Live usage: running on my 3-node homelab cluster with two real tenants (small — a TV-writing agent and another experiment) for the past few weeks. Caught a real production self-deadlock during this period (v0.5.8), which is what triggered the 42-task hardening sprint. What I don't have and should: a recall-quality-over-time benchmark. Something like: seed 5,000 memories with known redundancy and contradictions, measure recall precision@10 before and after think(), and publish the curve. That's the evidence you're asking for, and you're right it's missing. I'll run that and post the numbers in a follow-up.
The ASCII diagram fair point too — website has proper rendering (yantrikdb.com) but the README should have an SVG.
Appreciate the pushback — this is more useful than encouragement.
What's the loop behind consolidation? Random sampling and LLM to merge?
Pull the N most recent active memories (default 30) with embeddings Pairwise cosine similarity, threshold 0.85 For each similar pair, check if they share extracted entities Shared entities + similarity 0.85-0.98 → flag as potential contradiction (same topic, maybe different facts) No shared entities + similarity > 0.85 → redundancy (mark for consolidation) Second pass at 0.65 threshold specifically for substitution-category pairs (e.g., "MySQL" vs "PostgreSQL" in otherwise-similar sentences) — these are usually real contradictions even at lower similarity Consolidation then collapses the redundancy set into canonical memories with combined importance/certainty. No LLM call, no randomness. Reproducible, cheap, runs in a background tick every ~5 minutes.
The LLM could improve this (better merge decisions, better entity alignment) but the tradeoff is cost and non-determinism. v1 is deterministic on purpose.
Source: crates/yantrikdb-core/src/cognition/triggers.rs and consolidate.rs next to it.
think() — consolidates similar memories into canonical ones (not just deduplication, actual collapse of redundant facts) Contradiction detection — when "CEO is Alice" and "CEO is Bob" both exist in memory, it flags the pair as a conflict the agent can resolve Temporal decay with configurable half-life — memories fade, so old unimportant stuff stops polluting recall Supermemory does more on the cloud side (team sharing, permissions, integrations). YantrikDB does more on the "actively manage my agent's memory" side. Different optimization points — no dig at Supermemory.
also any open source local or self hosted options?
duplicates per query (top-10): 0.9 → 0.0 top-result correct: 75% → 87.5% 11 consolidations in 80ms conflicts detected: 0 of 6 seeded ← this one matters Turns out conflict detection runs on graph edges, and /v1/remember doesn't auto-extract entities — so contradictions sit there invisibly until you explicitly call relate. That's a UX gap, not a missing feature, but it breaks the "drop memories in, get contradictions out" mental model. Filed as issues #1 and #2. Dataset + script + raw results: https://gist.github.com/spranab/49c618d3625dc131308227103af5.... Honest benches surface the kind of thing demos hide; thanks for pushing.