| ▲ | kgeist 7 hours ago | |
Just a week ago, I rewrote our RAG pipeline to use structured outputs, and the tests showed no significant difference in quality after a few tweaks (under vLLM). What helped was that we have a pipeline where another LLM automatically scores 'question-expected answer' pairs, so what we did was: tweak the schema/prompt => evaluate => tweak again, until we got good results in most cases, just like with free-form prompts. Several issues were found: 1. A model may sometimes get stuck generating whitespace at the end forever (the JSON schema allows it), which can lock up the entire vLLM instance. The solution was to use xgrammer, because it has a handy feature that disallows whitespace outside of strings. 2. In some cases I had to fiddle with metainformation like minItems/maxItems for arrays, or the model would either hallucinate or refuse to generate anything. 3. Inference engines may reorder the fields during generation, which can impact the quality due to the autoregressive nature of LLMs (like, the "calculation" field must come before the "result" field). Make sure the fields are not reordered. 4. Field names must be as descriptive as possible, to guide the model to generate expected data in the expected form. For example, "durationInMilliseconds" instead of just "duration". Basically, you can't expect a model to give you good results out of the box with structured outputs if the schema is poorly designed or underspecified. | ||