Posted by sbpayne 9 hours ago
So in practice I imagine you get at a lot of the same ideas / benefits!
useful for upcoming consultants to learn how to price services too.
Wrong. There can be a lot of subjectivity and pretending that some golden answer exists does more harm and narrows down the scope of what you can build.
My other main problem with data extraction tasks and why I'm not satisfied with any of the existing eval tools is that the schemas I write change can drastically as my understanding of the problem increases. And nothing really seems to handle that well, I mostly just resort to reading diffs of what happens when I change something and reading the input/output data very closely. Marimo is fantastic for anything visual like this btw.
Also there is a difference between: the problem in reality → the business model → your db/application schema → the schema you send to the LLM. And to actually improve your schema/prompt you have to be mindful of the entire problem stack and how you might separate things that are handled through post processing rather than by the LLM directly.
> Abstract model calls. Make swapping GPT-4 for Claude a one-line change.
And in practice random limitations like structured output API schema limits between providers can make this non-trivial. God I hate the Gemini API.
I'll think about how to word this better, thanks for the feedback!
Edit, read the article -its really good- that cycle of AI engineering progression is spot on -read the article too!
Are we playing philosophy here? If you move some part of the code from the repo and into a database, then changing that database is still part of the deployment, but now you just made your versioning have identity crisis. Just put your prompts in your git repo and say no when someone requests an anti-pattern be implemented.
It's annoying/difficult in practice if this is strictly in code. I don't think a database is necessarily the way to go, but it's just a common pattern I see. And I really strongly believe this is more of a need for a "development time override" than the primary way to deploy to production, to be clear.
The only thing I'd grab dspy for at this point is to automate the edges of the agentic pipeline that could be improved with RL patterns. But if that is true, you're really shorting yourself by giving your domain DSPY. You should be building your own RL learning loops.
My experience: If you find yourself reaching for a tool like Dspy, you might be sitting on a scenario where reinforcement learning approaches would help even further up the stack than your prompts, and you're probably missing where the real optimization win is. (Think bigger)