Remix.run Logo
apwheele 5 hours ago

I think XML is good to know for prompting (similar to how <think></think> was popular for outputs, you can do that for other sections). But I have had much better experience just writing JSON and using line breaks, colons, etc. to demarcate sections.

E.g. instead of

    <examples>
      <ex1>
        <input>....</input>
        <output>.....</output>
      </ex1>
      <ex2>....</ex2>
      ...
    </examples>
    <instructions>....</instructions>
    <input>{actual input}</input>
Just doing something like:

    ...instructions...
    input: ....
    output: {..json here}
    ...maybe further instructions...
    input: {actual input}
Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.

N of 1 anecdote anyway for one use case.

galaxyLogic 3 hours ago | parent | next [-]

XML helps because it a) Lets you to describe structures b) Make a clear context-change which make it clear you are not "talking in XML" you are "talking about XML".

I assume you are right too, JSON is a less verbose format which allows you to express any structure you can express in XML, and should be as easy for AI to parse. Although that probably depends on the training data too.

I recently asked AI why .md files are so prevalent with agentic AI and the answer is ... because .md files also express structure, like headers and lists.

Again, depends on what the AI has been trained on.

I would go with JSON, or some version of it which would also allow comments.

ekjhgkejhgk 5 hours ago | parent | prev | next [-]

Could you clarify, do those tags need to be tags which exist and we need to lear about them and how to use them? Or we can put inside them whatever we want and just by virtue of being tags, Claude understands them in a special way?

ezfe 5 hours ago | parent | next [-]

They probably don’t need to be specific values. The model is fine tuned to see the tags as signals and then interprets them

galaxyLogic 3 hours ago | parent [-]

If it walks like a duck ... AI thinks it is something like a duck.

apwheele 5 hours ago | parent | prev [-]

All the major foundation models will understand them implicitly, so it was popular to use <think>, but you could also use <reason> or <thinkhard> and the model would still go through the same process.

cyanydeez 2 hours ago | parent [-]

<ponderforamoment>HTML is a large subsection of their training data, so they're used to seeing a somewhat semantic worldview</ponderforamoment>

marxisttemp 3 hours ago | parent | prev [-]

XML is much more readable than JSON, especially if your data has characters that are meaningful JSON syntax

galaxyLogic 3 hours ago | parent [-]

I think readability is in the eye of the reader. JSON is less verbose, no ending tags everywhere, which I think makes it more readable than XML.

But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

ezfe 28 minutes ago | parent [-]

I disagree that XML is more readable in general, but for the purpose of tagging blocks of text as <important>important</important> in freeform writing, JSON is basically useless