the "Prompt Management" part of these products always seemed odd. Does anyone use it? Why?

dandelionv1bes 7 hours ago | parent | next [-]

I do understand why it’s a product - it feels a bit like what databricks has with model artifacts. Ie having a repo of prompts so you can track performance changes against is good. Especially if say you have users other than engineers touching them (ie product manager wants to AB).

Having said that, I struggled a lot with actually implementing langfuse due to numerous bugs/confusing AI driven documentation. So I’m amazed that it’s being bought to be really frank. I was just on the free version in order to look at it and make a broader recommendation, I wasn’t particularly impressed. Mileage may vary though, perhaps it’s a me issue.

▲

alexpadula 7 hours ago | parent [-]

I thought the docs were pretty good just going through them to see what the product was. For me I just don't see the use-case but I'm not well versed in their industry.

	▲	dandelionv1bes 7 hours ago \| parent [-]
		I think the docs are great to read, but implementing was a completely different story for me, ie, the Ask AI recommended solution for implementing Claude just didn’t work for me. They do have GitHub discussions where you can raise things, but I also encountered some issues with installation that just made me want to roll the dice on another provider. They do have a new release coming in a few weeks so I’ll try it again then for sure. Edit: I think I’m coming across as negative and do want to recommend that it is worth trying out langfuse for sure if you’re looking at observability!

▲

pprotas 5 hours ago | parent | prev | next [-]

Iterating on LLM agents involves testing on production(-like) data. The most accurate way to see whether your agent is performing well is to see it working on production.

You want to see the best results you can get from a prompt, so you use features like prompt management an A/B testing to see what version of your prompt performs better (i.e. is fit to the model you are using) on production.

▲

cunha00 4 hours ago | parent | prev [-]

We use it for our internal doc analysis tool. We can easily extract production genrrations, save them to datasets and test edge cases. Also, it allows prompt separation in folders. With this, we have a pipeline for doc abalysis where we have default prompts and the user can set custom prompts for a part of of the pipeline. Execution checks for a user prompt before inference, if not, uses default prompt, which is already cached on code. We plan to evaluate user prompts to see which may perform better and use them to improve default prompt.