I think the real problem with using DSPy is that many of the problems people are trying to solve with LLMs (agents, chat) don't have an obvious path to evaluate. You have to really think carefully on how to build up a training and evaluation dataset that you can throw to DSPy to get it to optimize.

This takes a ton of upfront work and careful thinking. As soon as you move the goalposts of what you're trying to achieve you also have to update the training and evaluation dataset to cover that new use case.

This can actually get in the way of moving fast. Often teams are not trying to optimize their prompts but even trying to figure out what the set of questions and right answers should be!

▲

sbpayne 6 hours ago | parent [-]

Yeah, I think Dspy often does not really show it's benefit until you have a good 'automated metric', which can be difficult to get to.

I think the unfortunate part is: the way it encourages you to structure your code is good for other reasons that might not be an 'acute' pain. And over time, it seems inevitable you'll end up building something that looks like it.

▲

memothon 6 hours ago | parent [-]

Yeah I agree with this. I will try to use it in earnest on my next project.

That metric is the key piece. I don't know the right way to build an automated metric for a lot of the systems I want to build that will stand the test of time.

	▲	sbpayne 5 hours ago \| parent [-]
		To be clear: I don't know that I would recommend using it, exactly. I would just make sure you understand the lessons so you see how it best makes sense to apply to your project :)