| ▲ | mritchie712 7 hours ago | ||||||||||||||||
the "Prompt Management" part of these products always seemed odd. Does anyone use it? Why? | |||||||||||||||||
| ▲ | dandelionv1bes 7 hours ago | parent | next [-] | ||||||||||||||||
I do understand why it’s a product - it feels a bit like what databricks has with model artifacts. Ie having a repo of prompts so you can track performance changes against is good. Especially if say you have users other than engineers touching them (ie product manager wants to AB). Having said that, I struggled a lot with actually implementing langfuse due to numerous bugs/confusing AI driven documentation. So I’m amazed that it’s being bought to be really frank. I was just on the free version in order to look at it and make a broader recommendation, I wasn’t particularly impressed. Mileage may vary though, perhaps it’s a me issue. | |||||||||||||||||
| |||||||||||||||||
| ▲ | pprotas 5 hours ago | parent | prev | next [-] | ||||||||||||||||
Iterating on LLM agents involves testing on production(-like) data. The most accurate way to see whether your agent is performing well is to see it working on production. You want to see the best results you can get from a prompt, so you use features like prompt management an A/B testing to see what version of your prompt performs better (i.e. is fit to the model you are using) on production. | |||||||||||||||||
| ▲ | cunha00 4 hours ago | parent | prev [-] | ||||||||||||||||
We use it for our internal doc analysis tool. We can easily extract production genrrations, save them to datasets and test edge cases. Also, it allows prompt separation in folders. With this, we have a pipeline for doc abalysis where we have default prompts and the user can set custom prompts for a part of of the pipeline. Execution checks for a user prompt before inference, if not, uses default prompt, which is already cached on code. We plan to evaluate user prompts to see which may perform better and use them to improve default prompt. | |||||||||||||||||