| ▲ | LLM Structured Outputs Handbook(nanonets.com) | |||||||||||||||||||||||||||||||||||||||||||||||||
| 148 points by vitaelabitur a day ago | 26 comments | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | HanClinto 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
This is a seriously beautiful guide. I really appreciate you putting this together! I especially love the tab-through animations on the various pages, and this is one of the best explanations that I've seen. I generally feel I understand grammar-constrained generation pretty well (I've merged a handful of contributions to the llama.cpp grammar implementation), and yet I still learned some insights from your illustrations -- thank you! I'm also really glad that you're helping more people understand this feature, how it works, and how to use it effectively. I strongly believe that structured outputs are one of the most underrated features in LLM engines, and people should be using this feature more. Constrained non-determinism means that we can reliably use LLMs as part of a larger pipeline or process (such as an agent with tool-calling) and we won't have failures due to syntax errors or erroneous "Sure! Here's your output formatted as JSON with no other text or preamble" messages thrown in. Your LLM output might not be correct. But grammars ensure that your LLM output is at least _syntactically_ correct. It's not everything, but it's not nothing. And especially if we want to get away from cloud deployments and run effective local models, grammars are an incredibly valuable piece of this. For practical examples, I often think of Jart's example in her simple LLM-based spam-filter running on a Raspberry Pi [0]: > llamafile -m TinyLlama-1.1B-Chat-v1.0.f16.gguf \ > --grammar 'root ::= "yes" | "no"' --temp 0 -c 0 \ > --no-display-prompt --log-disable -p "<|user|> > Can you say for certain that the following email is spam? ... Even though it's a super-tiny piece of hardware, by including a grammar that constrains the output to only ever be "yes" or "no" (it's impossible for the system to produce a different result), then she can use a super-small model on super-limited hardware, and it is still useful. It might not correctly identify spam, but it's never going to break for syntactic reasons, which gives a great boost to the usefulness of small, local models. * [0]: https://justine.lol/matmul/ | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mcyc 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
This is a fantastic guide! I did a lot of work on structured generation for my PhD. Here are a few other pointers for people who might be interested: Some libraries: - Outlines, a nice library for structured generation
- Guidance (already covered by FlyingLawnmower in this thread), another nice library
- XGrammar, a less-featureful but really well optimized constrained generation library
Some papers:- Efficient Guided Generation for Large Language Models
- Automata-based constraints for language model decoding
- Pitfalls, Subtleties, and Techniques in Automata-Based Subword-Level Constrained Generation
Some blog posts:- Fast, High-Fidelity LLM Decoding with Regex Constraints
- Coalescence: making LLM inference 5x faster | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | FlyingLawnmower 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Very nicely written guide! If the authors or readers are interested in some of the more technical details of how we optimized guidance & llguidance, we wrote up a little paper about it here: https://guidance-ai.github.io/llguidance/llg-go-brrr | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | bandrami an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
These are cool tricks but this seems like an impedence mismatch: why would you use an LLM (a probabilistic source of plausible text) in a situation where you want a deterministic source of text where plausibility is not enough? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | tehnub 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
This is a nice guide. I especially like the masked decoding diagrams on this page https://nanonets.com/cookbooks/structured-llm-outputs/basic-.... edit: Somehow that link doesn't work... It's the diagram on the "constrained method" page | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dfajgljsldkjag 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
I agree that building agents is basically impossible if you cannot trust the model to output valid json every time. This seems like a decent collection of the current techniques we have to force deterministic structure for production systems. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | shmolyneaux 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Are there output formats that are more reliable (better adherence to the schema, easier to get parse-able output) or cheaper (fewer tokens) than JSON? YAML has its own problems and TOML isn't widely adopted, but they both seem like they would be easier to generate. What have folks tried? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | roywiggins 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
> We use a lenient parser like ast.literal_eval instead of the standard json.loads(). It will handle outputs that deviate from strict JSON format. (single quotes, trailing commas, etc.) A nitpick: that's probably a good idea and I've used it before, but that's not really a lenient json parser, it's a Python literal parser and they happen to be close enough that it's useful. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | kylecazar 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
This information is really presented well. I subscribed to your newsletter. Thanks! | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Imanari 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
I like structured outputs as much as the next guy but be careful not to try to structure natural language. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | fabiensanglard 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
What would be the point of outputting unconstrained json if the output is consumed by a human? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | maxdo 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Huge fan of BAML , nice coverage | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | earth2mars 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
BAML | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||