Posted by bsuh 22 hours ago
That’s why you see “base” vs “instruct” models for example — base is just that, the basic language model that models language, but doesn’t follow instructions yet.
Especially the open weights models have lots of variants, eg tuned for math, tuned for code, tuned for deep thinking, etc.
But it’s definitely a post train thing, usually done by generating synthetic data using other models.
Markdown files are a good reference but they are a weak enforcement tool and go stale easily.
Avoid burying yourself in more skills docs you’re not even writing yourself and probably never even read. Focus that toward deterministic tooling. (Not that skills or prompts are bad, I agree a meta skill that tells an agent what subagents and what order to run is useful)
My personal opinion is that AI and agents are being misrepresented… The amount of setup, guidance and testing that’s required to create smarter version of a form is insane.
At the moment my small test is: Compressed instructions (to fit within the 8k limit) 9 different types of policies to guide the agent (json) 3 actual documents outlining domain knowledge (json) 8 Topics (hint harvesting, guide rails, and the pieces of information prepared as adaptive cards for the user) 3 Tools (to allow for connectors)
The whole thing is as robust as I can make it but it still feels like a house of cards and I expect some random hiccup will cause a failure.
We need to define agents in code, and drive them through semi-deterministic workflows. Kick subtasks off to agents where appropriate, but do things like gather context and deal with agent output deterministically.
This is a massive boost in accuracy, cost efficiency, AND speed. Stop using tokens to do the deterministic parts of the task!
why tf would i ever need this
It's externally orchestrated and managed, not by an agent running the the loop.
The goal is to force LLMs to produce exactly what you want every time.
I will be open sourcing soon. You can use whatever harness or tools you already use, you just delegate the actual implementation to the engine.
Both designs (Lightroom, game engines) have worked successfully.
There's probably nothing that prevents mixing both approaches in the same "app".
Still have yet to see a universal treatment that tackles this well.
I see this as the most robust way to build a predictable system that runs in a controlled way while taking advantage of probabilistic AIs while reducing the impact of their alucinations.
LLMs simply can't be trusted to follow instructions in the general case, no matter how much you constraint them. The power of very large probabilistic models is that they basically solved the _frame problem_ of classic AI: logical reasoning didn't work for general tasks because you can't encode all common sense knowledge as axioms, and inference engines lost their way trying to solve large problems.
LLMs fix those handicaps, as they contain huge amounts of real world knowledge and they're capable of finding facts relevant to the problem at hand in an efficient way. Any autonomous system using them should exploit this benefit.