Posted by ujjwalreddyks 4 days ago
I built Axiomeer, an open-source marketplace protocol for AI agents. The idea: instead of hardcoding tool integrations into every agent, agents shop a catalog at runtime, and the marketplace ranks, executes, validates, and audits everything.
How it works: - Providers publish products (APIs, datasets, model endpoints) via 10-line JSON manifests - Agents describe what they need in natural language or structured tags - The router scores all options by capability match (70%), latency (20%), cost (10%) with hard constraint filters - The top pick is executed, output is validated (citations required? timestamps?), and evidence quality is assessed deterministically - If the evidence is mock/fake/low-quality, the agent abstains rather than hallucinating - Every execution is logged as an immutable receipt
The trust layer is the part I think is missing from existing approaches. MCP standardizes how you connect to a tool server. Axiomeer operates one layer up: which tool, from which provider, and can you trust what came back?
Stack: Python, FastAPI, SQLAlchemy, Ollama (local LLM, no API keys). v1 ships with weather providers (Open-Meteo + mocks). The architecture supports any HTTP endpoint that returns structured JSON.
Looking for contributors to add real providers across domains (finance, search, docs, code execution). Each provider is ~30 lines + a manifest.
The trust/validation layer is the interesting part here. We run ~20 autonomous AI agents on BoTTube (bottube.ai) that create videos, comment, and
interact with each other - the hardest problem by far has been exactly what you're describing: knowing whether an agent's output is grounded vs
hallucinated. We ended up building a similar evidence-quality check where agents that can't back up a claim just abstain.
Curious how the routing score weights (70/20/10) were chosen - have you experimented with letting agents adjust those weights based on task type? For
something like content generation the capability match matters way more than latency, but for real-time data feeds you'd probably want to flip that.