Posted by simple10 3 hours ago
A few interesting learnings from building and using this:
- Claude code hooks are blocking - performance degrades rapidly if you have a lot of plugins that use hooks
- Hooks provide a lot more useful info than OTEL data
- Claude's jsonl files provide the full picture
- Lifecycle management of MCP processes started by plugins is a bit kludgy at best
The biggest takeaway is how much of a difference it made in claude performance when I switched to background (fire and forget) hooks and removed all other plugins. It's easy to forget how many claude plugins I've installed and how they effect performance.
The Agents Observe plugin uses docker to start the API and dashboard service. This is a pattern I'd love to see used more often for security (think Axios hack) reasons. The tricky bit was handling process management across multiple claude instances - the solution was to have the server track active connections then auto shut itself down when not in use. Then the plugin spins it back up when a new session is started.
This tool has been incredibly useful for my own daily workflow. Enjoy!
The opacity problem is the one I hit hard: when a coordinator spawns 3-4 agents in parallel (builder, reviewer, tester, each with their own tool calls), the only visibility you have is what they choose to report back. Which is often sanitised and … dangerously optimistic.
The role separation / independent verification structure I run helps catch bad outputs, but it doesn't give me the live timeline of HOW an agent got to a conclusion. That's why I find this genuinely useful.
Noticed OpenClaw is already on the roadmap - had my hands tingling to fork and adapt it. Starring it for now and added to my watchlist. The hook architecture should translate … OpenClaw fires session events that could feed the same pipeline. Looking forward to seeing that happen.
The docker-based service pattern is smart too. I went a different direction for my own setup -- tmux sessions with worktree isolation per agent, which keeps things lightweight but means I have zero observability into what each agent is actually doing beyond tailing logs manually. This solves that gap in a way that doesn't add overhead to the agent itself, which is the right tradeoff.
Curious about one thing -- how does the dashboard handle the case where a sub-agent spawns its own sub-agents? Does it track the full tree or just one level deep?
[Edit] When claude spawns sub-agents, they inherit the parent's hooks. So all sub-agents activity gets logged by default.
Now I'm regretting not going deeper on these. This is the type of interface that I think will be perfect for some things I want to demonstrate to a greater audience.
Now that we have the actual internals I have so many things I want to dig through.
When I go home to my $20 plan I am sad and annoyed but I don't want to put more in for what is a good enough for me to work a bit at a time, a good pomodoro timer for me personally.
Something like this is perfect for some of the issues that I've wanted to solve as a command and control tool with malleable visuals.
OP: This is cool, thank you for sharing.
It's super important to check your plugins or use a proxy to inspect raw prompts. If you have a lot of skills and plugins installed, you'll burn through tokens 5-10x faster than normal.
Also have claude use sub-agents and agent teams. They're significantly lighter on token usage when they're spawned with fresh context windows. You can see in Agents Observe dashboard exactly what prompt and response claude is using for spawning sub-agents.
Basic flow:
1. Plugin registers hooks that call a dump pipe script that sends hook events data to api server
2. Server parses events and stores them in sqlite by session and agent id - mostly just stores data, minimal processing
3. Dashboard UI uses websockets to get real-time events from the server
4. UI does most of the heavy lifting by parsing events, grouping by agent / sub-agent, extracting out tool calls to dynamically create filters, etc.
It took a lot of iterations to keep things simple and performant.
You can easily modify the app/client UI code to fully customize the dashboard. The API app/server is intentionally unopinionated about how events will be rendered. This was by design to add support for other agent events soon.
Node has a 30-50ms cold start overhead. Then there's overhead in the hook script to read local config files, make http request to server, and check for callbacks. In practice, this was about 50-60ms per hook.
The background hook shim reduces latency to around 3-5ms (10x improvement). It was noticeable when using agent teams with 5+ sub-agents running in parallel.
But the real speed up was disabling all the other plugins I had been collecting. It piles up fast and is easy for me to forget what's installed globally.
I've also started periodically asking claude to analyze it's prompts to look for conflicts. It's shockingly common for plugins and skills to end up with contradictory instructions. Opus works around it just fine, but it's unnecessary overhead for every turn.
th
The next big layer for my personal stack is full orchestration. Something like Paperclip but much more specialized for my use cases.