Posted by mtricot 4 days ago
Show HN: Airbyte Agents – context for agents across multiple data sources
Here’s a quick walkthrough: https://www.youtube.com/watch?v=ZosDytyf1fg
As agents move into real workflows, they need access to more tools (e.g. Slack, Salesforce, Linear). That means a ton of API plumbing: authentication, pagination, filters, handling schema, and matching entities across systems.
Most MCPs don’t fix this. They’re thin wrappers over APIs, so agents inherit their weak primitives and still get it wrong most of the time, especially when working across tools.
An even deeper issue is that APIs assume you already know what to query (think endpoints, Object IDs, fields), whereas agents usually start one step earlier: they need first to discover what matters before they can even start reasoning.
So we built Airbyte Agents to be a context layer between your Agents and all of your data. The core of this is something we call Context Store: a data index optimized for agentic search, populated by our replication connectors. All that work on data connectors the last six years comes in handy here!
This gives agents a structured way to discover data, while still allowing them to read and write directly to the upstream system when needed.
What got us working on this was an insane trace from an agent we were migrating to our new SDK. It was supposed to answer "which customers are at risk of leaving this quarter?" The trace had 47 steps. Most were API calls. The agent first had to find a bunch of accounts, then map them to the right customers, then look for tickets, bla bla... and when the Agent finally responded, the answer sounded ok, but was wrong. Not only that, it was excruciatingly slow. So we had to do something about it.
That 47-step agent is one example of a question where Airbyte Agents does particularly well. Other examples: - “Show me all enterprise deals closing this month with open support tickets." - “Find every support ticket that doesn’t have a Github issue opened”
Some of these might sound simple, but the quality of the answer changes dramatically when the agent doesn’t have to assemble all that context at runtime.
Once we had an early version of the product, I spent a weekend building a benchmark harness to see if it worked. Also for fun, I like writing benchmarks :). I compared calling the Airbyte Agent MCP vs calling a bunch of vendor MCPs directly. I tested retrieval, and search.
For the sake of simplicity, I used token consumption as a unit of measure. I think that’s a good proxy for how well agents are working. A failing agent (like the one that took 47 steps), will churn through lots of tokens while getting nowhere, while a successful one will get straight to the point.
Here's what I found when measuring: for Gong, it used up to 80% fewer tokens than their own MCP, for Zendesk up to 90% fewer, for Linear up to 75%, and for Salesforce up to 16% (Salesforce’s own SOQL does a good job here).
Of course there is the usual obvious bias: we are the builders of what we are benchmarking. So we made the test harness public: https://github.com/airbytehq/airbyte-agents-benchmarks. Feel free to poke at it, and please tell us what you find if you do!
It's still early and some parts are rough, but we wanted to share this with the community asap. We'd love to hear from people building agents: - Are you indexing data ahead of time, or letting the agent call APIs live? - How are you matching entities across systems?
Would also love to hear any thoughts, comments, or ideas of how we could make this better, and if there are obvious things we’re missing. For now, we’re excited to keep building!
hmm so airbyte agents could serve as a form of MCP gateway, or a key building block of an MCP gateway, which btw is how anthropic uses mcp themselves for all their internal apps https://www.youtube.com/watch?v=CD6R4Wf3jnY&t=1s&pp=0gcJCd4K...
i think my most sad/interesting observation about ai engineers is that many ai apps are super data hungry, but many dont have the necessary data engineering background to even know they need an airbyte or what tradeoffs to make in an etl pipeline. would love a "data engineering for ai engineers" type braindump session from someone from airbyte at AIE (https://ai.engineer/cfp )
> airbyte agents could serve as a form of MCP gateway
Exactly! And a single set of tools for agents to access both realtime (direct reads/writes) as well as cached (Context Store), bringing hopefully the best access path for each different use case.
> would love a "data engineering for ai engineers" type braindump ... at AIE
Great idea - we have a booth at AIE, and we'll submit there for a talk. Mario will reach out to you about this. :)
It’s definitely not old school ETL + dbt + BI tool, it might be something like this, but it’s very early
It’s not why we started using posthog but it definitely sealed the deal when you see how simple and reliable that experience is
I am happy to hear you are still getting value out of PyAirbyte! If you do try out Airbyte Agents, please let us know how it goes! We are always listening to feedback and would love to hear from you as you explore the new tools and capabilities.
It spawns agent CLIs (Claude Code, Codex, Cursor, GitHub Copilot) with and without Unblocked's MCP server attached, then statistically compares the results: https://github.com/unblocked/unblocked-harness-compare
We likewise measured token savings, (wall clock) time, # tool calls, and # turns.
If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?
Have you done any testing on guided indexing, or metadata layers on top of the data? My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time. I.e. _what_ you index, and how you signpost for agents, becomes really important (unless your data is super clean I guess). This does look like a good foundation for that kind of tooling though!
> If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?
While we haven't yet published details on the backend implementation, I can say that our implementation performs very well without needing to prioritize specific fields for indexing. We aim for large text fields to perform decently and retrieval based on small/compressible fields like ints to be fast. (More to come on this in the coming months.)
> Have you done any testing on guided indexing, or metadata layers on top of the data?
We've been testing with different data scales and shapes. Nothing detailed to share yet, but performance has (so far) never itself become the bottleneck in our agent testing. (The LLM thinking itself is often the bottleneck.)
> My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time.
Airbyte has rich metadata on our upstream connector's data models, which I think helps us a lot to deliver helpful context to the agent. Another option, when optimizing for specific use cases, is to build your own agent tools on top of our Agent SDK. This allows you to make the calls organic and build the tools in a way that makes natural sense to the agent, regardless of source shape or which system(s) that data is coming from.
> This does look like a good foundation for that kind of tooling though!
We agree! Thanks again for sharing your thoughts here.
you mentioned that performance was never an issue, I am really intrigued how this is achieved.
I have 3 General questions:
1. How big (estimate in bytes) and complex were the test datasources? I couldn't find this in the benchmark repo.
2. how is the business context managed? In the blog "Airbyte Agents: A New Era for Airbyte" it was mentioned handling the business context but in the context layer docs it only talks about schema discovery (I got a bit confused)
3. When you said performance was never an issue, do you mean the user always got the answer it was looking for?
(We use airbyte at my company, although we self-host it.)
(I'd guess there is actually SQL at the bottom layer, but there's no way to talk to it?)
I understand the instinct to try to make a proprietary moat around it all but I think the pattern is useful and obvious enough that all big orgs will be doing something very similar within 5 years or so.
That said, please stay tuned - and thank you again for this valuable feedback.
How do you handle encryption and confidentiality? Im building in this space too (MCP gateway https://www.gatana.ai/) which already have semantic search for tool outputs, and ensuring encryption and confidentiality is not trivial.