It is such a common pattern for LLMs to surround generated JSON with ```json … ``` that I check for this at the application level and fix it. Ten years ago I would do the same sort of sanity checks on formatting when I used LSTMs to generate synthetic data.

▲

mpartel 5 days ago | parent | next [-]

Some LLM APIs let you give a schema or regex for the answer. I think it works because LLMs give a probability for every possible next token, and you can filter that list by what the schema/regex allows next.

▲

hansvm 5 days ago | parent [-]

Interestingly, that gives a different response distribution from simply regenerating while the output doesn't match the schema.

▲

Rudybega 5 days ago | parent | next [-]

This is true, but there are methods to greatly reduce the effect of this and generate results that match or even improve overall output accuracy:

e.g. DOMINO https://arxiv.org/html/2403.06988v1

▲

joshred 5 days ago | parent | prev [-]

It sounds like they are describing a regex filter being applied to the model's beam search. LLMs generate the most probable words, but they are frequently tracking several candidate phrases at a time and revising their combined probability. It lets them self correct if a high probability word leads to a low probability phrase.

I think they are saying that if highest probability phrase fails the regex, the LLM is able to substitute the next most likely candidate.

	▲	stavros 5 days ago \| parent [-]
		You're actually applying a grammar to the token. If you're outputting, for example, JSON, you know what characters are valid next (because of the grammar), so you just filter out the tokens that don't fit the grammar.

▲

viridian 5 days ago | parent | prev | next [-]

I'm sure the reason is the plethora of markdown data is was trained on. I personally use ``` stuff.txt ``` extremely frequently, in a variety of places.

In slack/teams I do it with anything someone might copy and paste to ensure that the chat client doesn't do something horrendous like replace my ascii double quotes with the fancy unicode ones that cause syntax errors.

In readme files any example path, code, yaml, or json is wrapped in code quotes.

In my personal (text file) notes I also use ``` {} ``` to denote a code block I'd like to remember, just out of habit from the other two above.

▲

accrual 5 days ago | parent [-]

Same. For me it's almost like a symbiotic thing to me. After using LLMs for a couple of years I noticed I use code blocks/backticks a lot more often. It's helpful for me as an inline signal like "this is a function name or hostname or special keyword" but it's also helpful for other people/Teams/Slack and LLMs alike.

	▲	OJFord 5 days ago \| parent [-]
		I'm the opposite, always been pretty good about doing that in Slack etc. (or even here where it doesn't affect the rendering) but I sometimes don't bother in LLM chat.

▲

fumeux_fume 5 days ago | parent | prev | next [-]

Very common struggle, but a great way to prevent that is prefilling the assistant response with "{" or as much JSON output as you're going to know ahead of time like '{"response": ['

▲

XenophileJKO 5 days ago | parent | next [-]

Just to be clear for anyone reading this, the optimal way to do this is schema enforced inference. You can only get a parsable response. There are failure modes, but you don't have to mess with parsing at all.

▲

psadri 5 days ago | parent | prev [-]

Haven’t tried this. Does it mix well with tool calls? Or does it force a response where you might have expected a tool call?

	▲	fumeux_fume 5 days ago \| parent [-]
		It'll force a response that begins with an open bracket. So if you might need a response with a tool call that doesn't start with "{", then it might not fit your workflow.

▲

Alifatisk 5 days ago | parent | prev | next [-]

I think this is the first time I stumped upon someone who actually mentions LSTM in a practical way instead of just theory. Cool!

Would you like to elaborate further on how the experience was with it? What was your approach for using it? How did you generate synthetic data? How did it perform?

	▲	p1esk 5 days ago \| parent [-]
		10 years ago I used LSTMs for music generation. Worked pretty well for short MIDI snippets (30-60 seconds).

▲

freehorse 5 days ago | parent | prev | next [-]

I had similar issues with local models, ended up actually requesting the backticks because it was easier this way, and parsed the output accordingly. I cached a prompt with explicit examples how to structure data, and reused this over and over. I have found that without examples in the prompts some llms are very unreliable, but with caching some example prompts this becomes a non-issue.

▲

Alifatisk 5 days ago | parent | prev | next [-]

I do use backticks a lot when sharing examples in different format when using LLMs and I have instructed them to do likewise, I also upvote whenever they respond in that matter.

I got this format from writing markdown files, it’s a nice way to share examples and also specify which format it is.

▲

mejutoco 5 days ago | parent | prev | next [-]

Funny, I do the same. Additionally, one can define a json schema for the output and try to load the response as json or retry for a number of times. If it is not valid json or the schema is not followed we discard it and retry.

It also helps with having a field of the json be the confidence or a similar pattern to act as a cut for what response is accepted.

▲

tosh 5 days ago | parent | prev | next [-]

I think most mainstream APIs by now have a way for you to conform the generated answer to a schema.

▲

barrell 5 days ago | parent | prev [-]

Yeah, that’s infuriating. They’re getting better now with structured data, but it’s going to be a never ending battle getting reliable data structures from an LLM.

This is maybe more maybe less insidious. It will literally just insert a random character into the middle of a word.

I work with an app that supports 120+ languages though. I give the LLM translations, transliterations, grammar features etc and ask it to explain it in plain English. So it’s constantly switching between multiple real, and sometimes fake (transliterations) languages. I don’t think most users would experience this