If DSPy is so great, why isn't anyone using it?

Posted by sbpayne 7 hours ago

If DSPy is so great, why isn't anyone using it?(skylarbpayne.com)

189 points | 110 commentspage 3

_andrei_ 5 hours ago|

Almost all the points are not about what DSPy is mainly supposed to offer. What's supposedly great at is automatic optimization, for everything else... who the hell puts Python in production just to make some API calls? There are "frameworks" available in all the better languages, but the constructs behind are not that complicated. And why does DSPy even try to compete with LangChain/Graph/crap?

sbpayne 5 hours ago|

I think automatic optimization is valuable, but it's not what Dspy "is"; you can see this consistently through @lateinteraction's tweets.

And hopefully it's clear enough from the post: I'm not necessarily suggesting people use Dspy, just that there are important lessons to take with you, even if you don't use it :)

benh2477 3 hours ago||

The adoption gap feels real. My experience is that developers don't trust AI outputs enough to build production workflows around them yet — the missing piece isn't better prompting frameworks, it's confidence signals that tell you when to trust the output.

ijk 7 hours ago||

This matches my experience with Dspy. I ended up removing it from our production codebase because, at the time, it didn't quite work as effectively as just using Pydantic and so forth.

The real killer feature is the prompt compilation; it's also the hardest to get to an effective place and I frequently found myself needing more control over the context than it would allow. This was a while ago, so things may have improved. But good evals are hard and the really fancy algorithms will burn a lot of tokens to optimize your prompts.

sbpayne 7 hours ago|

Yes! I have also felt this. I highly recommend taking a look at Maxime's template adapter: https://github.com/dspy-community/dspy-template-adapter

I think it solves some of this friction!

tech_hutch 3 hours ago||

I read the title as "If DarkSydePhil-y is so great, why isn't anyone using it?"

QuadmasterXLII 7 hours ago||

If you find yourself adding a database because thats less painful than regular deployments from your version control, something is hair on fire levels of wrong with your CICD setup.

sbpayne 7 hours ago|

I think this misunderstands the need for iteration! Maybe I could have written it more clearly :).

The reality is that you don't want to re-deploy for every prompt change, especially early on. You want to get a really tight feedback loop. If prompt change requires a re-deploy, that is usually too slow. You don't have to use a database to solve this, but it's pretty common to see in my experience.

ijk 7 hours ago||

I've been reaching for BAML when I really need prompt iteration at speed.

sbpayne 7 hours ago||

I consistently hear great things from Dspy users. At the same time, it feels like adoption is always low.

Stranger still: it seems like every company I have worked with ends up building a half-baked version of Dspy.

CuriouslyC 7 hours ago|

Two issues:

1. People don't want to switch frameworks, even though you can pull prompts generated by DSPy and use them elsewhere, it feels weird.

2. You need to do some up-front work to set up some of the optimizers which a lot of people are averse to.

brokensegue 7 hours ago||

i've tried it a few times and it's never really helped as much as i expected. though i know they've released a couple times since I last tried it.

sbpayne 7 hours ago|

yeah what I'm trying to get across here is that: Dspy does not solve an immediate problem, which is why many feel this way and consequently why it doesn't have great adoption!

But on the other hand, I think people unintentionally end up re-implementing a lot of Dspy.

love2read 4 hours ago||

I really enjoyed this blog format. I think it explained the problem well in a way that made it immediately clear why the solution solved the problem when shown DSPy.

sbpayne 4 hours ago|

Thank you! Let me know if anything could be more clear, always something I can improve here I'm sure :)

Silamoth 3 hours ago||

Am I the only one disappointed this was about some LLM slop and not digital signal processing? DSP is a well-established technical acronym, so I expected to hear about a new Python DSP library. Oh well.

msp26 6 hours ago|

> Data extraction tasks are amongst the easiest to evaluate because there’s a known “right” answer.

Wrong. There can be a lot of subjectivity and pretending that some golden answer exists does more harm and narrows down the scope of what you can build.

My other main problem with data extraction tasks and why I'm not satisfied with any of the existing eval tools is that the schemas I write change can drastically as my understanding of the problem increases. And nothing really seems to handle that well, I mostly just resort to reading diffs of what happens when I change something and reading the input/output data very closely. Marimo is fantastic for anything visual like this btw.

Also there is a difference between: the problem in reality → the business model → your db/application schema → the schema you send to the LLM. And to actually improve your schema/prompt you have to be mindful of the entire problem stack and how you might separate things that are handled through post processing rather than by the LLM directly.

> Abstract model calls. Make swapping GPT-4 for Claude a one-line change.

And in practice random limitations like structured output API schema limits between providers can make this non-trivial. God I hate the Gemini API.

sbpayne 6 hours ago||

This is very true! I could have been more careful/precise in how I worded this. I was really trying to just get across that it's in a sense easier than some tasks that can be much more open ended.

I'll think about how to word this better, thanks for the feedback!

sethkim 6 hours ago|||

This is extremely true. In fact, from what we see many/most of the problems to be solved with LLMs do not have ground-truth values; even hand-labeled data tends to be mostly subjective.

rco8786 6 hours ago||

I think they're just saying that data extraction tasks are easy to evaluate because for a given input text/file you can specify the exact structured output you expect from it.

More comments...