Posted by bsuh 19 hours ago
Here's a pretty specific example of what I mean, but maybe food for thought:
Podcast (20 minute digest): https://pub-6333550e348d4a5abe6f40ae47d2925c.r2.dev/EP008.ht...
The second it works, bake the workflow into the harness. Yesterday I was doing just that, and the whole agent loop disappeared because the process could've been condensed into a one-shot request (+1 MorphLLM fast apply) from careful context construction. (It was an Autoresearcher)
Making an unreliable, nondeterministic system give reliable results for a bounded task with well-understood parameters is... like half of engineering, no?
There's a huge difference between "generate this code here's a vague feature description" and "here's a list of criteria, assign this input to one of these buckets" -- the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!
...which is why we write deterministic code to take the human out of the pipeline. One of the early uses of computers was calculating firing tables for artillery, to replace teams of humans that were doing the calculations by hand (and usually with multiple humans performing each calculation to catch errors). If early computers had a 99% chance of hallucinating the wrong answer to an artillery firing table, the response from the governments and militaries that used them would not be to keep using computers to calculate them. It would be to go back to having humans do it with lots of manual verification steps and duplicated work to be sure of the results.
If you're trying to make LLMs (a vague simulacrum of humans) with their inherent and unsolvable[1] hallucination problems replace deterministic systems, people are going to eventually decide to return to the tried and true deterministic systems.
But if you're trying to tell me that every time you list criteria you get them all perfectly matched, you're clearly gifted.
Though chaotic, which I believe is the better word here - a single letter change may result in widely different results.
We just choose to use more random inference rules, because they have better results.
Somewhere in between that I guess is the varying levels of intelligence more likely able to make the “right” decision for anything you throw at it.
Determinism is a different matter. Scripts and hooks are really the main levers you can pull there, but yeah - a a decent script and a cron job will handle certain things much better (and for a fraction of the cost)
0 - https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-...
My first thought was, well agents seem nice, but I think, AI workflows are a better bet. However, I don't really understood AI or agents in depth and felt like I was just "doing things the old way" and removing flexibility from agents was a ridiculous idea.
After some research I got the impression that I was right. A well defined workflow and scope is just what's needed for AI. It's cheaper and more consistent. It probably even makes the whole thing run well with non-SOTA models.
https://github.com/yieldthought/flow
Happily, 5.5 is good at writing and using it.
Deterministic workflows using AI to help perform those steps not requiring human input has been an area of interest for me for some time. Particularly interesting how you are using the AI to determine what a step has achieved and the action of the next step.
Combine it with workflow elements that does handle human steps together with a notification/routing/task system would make for a helpful system for so many.
This is the only way to guarantee AI usage doesn't burn you. Any automation beyond this is just theater, no matter how much that hurts to hear/undermines your business model.
A bird sings, a duck quacks. You don't expect the duck to start singing now, do you?
If a business can get away with some margin of error being acceptable, more power to them. But if not (or doing so would cause additional problems; what I'd imagine to be true for a non-trivial number of orgs), it's wise to consider the nature of the tool a lot of people are suggesting is mandatory if you're dependent on consistent, predictable results.
Presuming you meant burns you out though.
It will make a mistake and you will get burned, so you have to babysit it.