All your agents are going async

Posted by zknill 2 days ago

All your agents are going async(zknill.io)

68 points | 44 commentspage 2

sonink 2 hours ago|

I was of the same view - but then there is this other trend which is putting sync back in favor. And that is that agents are becoming faster. If they are faster - it makes sense to stick around and maintain your 'context' about the task and supervise in real time. The other thing which might keep sync in fashion is that LLM providers are cutting back on cheap tokens. So you have a bigger incentive to stick around and make sure that your agent is not going astray.

The only place I use async now is when I am stepping away and there are a bunch of longer tasks on my plate. So i kick them off and then get to review them when ever I login next. However I dont use this pattern all that much and even then I am not sure if the context switching whenever I get back is really worth it.

Unless the agents get more reliable on long horizon tasks, it seems that async will have limited utility. But can easily see this going into videos feeding the twitter ai launch hype train.

htahir111 4 hours ago||

How would you differentiate between other tools like Temporal or Kitaru (https://kitaru.ai/) ?

zknill 4 hours ago|

I don't know Kitaru too well, but I do know Temporal a bit.

The pattern I describe in the article of 'channels' works really well for one of the hardest bits of using a durable execution tool like Temporal. If your workflow step is long running, or async, it's often hard to 'signal' the result of the step out to some frontend client. But using channels or sessions like in the article it becomes super easy because you can write the result to the channel and it's sent in realtime to the subscribed client. No HTTP polling for results, or anything like that.

htahir111 3 hours ago||

so to be clear, this should be used "instead of" rather then "on top of" durable execution engines?

TacticalCoder 4 hours ago||

> ... and streaming the tokens back on the HTTP response as an SSE stream

> So how are folks solving this?

$5 per month dedicated server, SSH, tmux.

dist-epoch 4 hours ago||

Can anybody explain why many times if you switch away from the chat app on the phone, the conversation can get broken?

Having long living requests, where you submit one, you get back a request_id, and then you can poll for it's status is a 20 year old solved problem.

Why is this such a difficult thing to do in practice for chat apps? Do we need ASI to solve this problem?

zknill 3 hours ago|

I suspect the answer is that the AI chat-app is built so that the LLM response tokens are sent straight into the HTTP response as a SSE stream, without being stored (in their intermediate state) in a database. BUT the 'full' response _is_ stored in the database once the LLM stream is complete, just not the intermediate tokens.

If you look at the gifs of the Claude UI in this post[1], you can see how the HTTP response is broken on page refresh, but some time later the full response is available again because it's now being served 'in full' from the database.

[1]: https://zknill.io/posts/chatbots-worst-enemy-is-page-refresh...

petesergeant 4 hours ago||

at https://agentblocks.ai we just use Google-style LROs for this, do we really need a "durable transport for AI agents built around the idea of a session"?

zknill 4 hours ago|

Assuming LROs are "Long running operations", then you kick off some work with an API request, and get some ID back. Then you poll some endpoint for that ID until the operation is "done". This can work, but when you try and build in token-streaming to this model, you end up having to thread every token through a database (which can work), and increasing the latency experienced by the user as you poll for more tokens/completion status.

Obviously polling works, it's used in lots of systems. But I guess I am arguing that we can do better than polling, both in terms of user experience, and the complexity of what you have to build to make it work.

If your long running operations just have a single simple output, then polling for them might be a great solution. But streaming LLM responses (by nature of being made up of lots of individual tokens) makes the polling design a bit more gross than it really needs to be. Which is where the idea of 'sessions' comes in.

potter098 3 hours ago||

[dead]

maxbeech 5 hours ago|

[dead]