▲ | roflcopter69 a day ago | ||||||||||||||||
I'd be really interested in what you mean. Are the any studies that quantify this difference in model performance when using JSON or XML? What could be a good intuition for why there might be a big difference? If XML is better than JSON for LLMs, why isn't everyone and the grandma recommending me to use XML instead of JSON? Why is Google Gemini API offering structured output only with JSON schema instead of XML schema? | |||||||||||||||||
▲ | simonw a day ago | parent | next [-] | ||||||||||||||||
I don't know if the XML is better than JSON thing still holds with this year's frontier models, but it was definitely a thing last year. Here's Anthropic's documentation about that: https://docs.claude.com/en/docs/build-with-claude/prompt-eng... Note that they don't actually suggest that the XML needs to be VALID! My guess was that JSON requires more characters to be escaped than XML-ish syntax does, plus matching opening and closing tags makes it a little easier for the LLM not to lose track of which string corresponds to which key. | |||||||||||||||||
| |||||||||||||||||
▲ | samuelknight a day ago | parent | prev [-] | ||||||||||||||||
(1) JSON requires lots of escape characters that mangle the strings + hex escapes and (2) it's much easier for model attention to track when a semantic block begins and ends when it's wrapped by the name of that section <instructions> ... ... </instructions> can be much easier than { "instructions": "..\n...\n" } especially when there are newlines, quotes and unicode | |||||||||||||||||
|