Remix.run Logo
xg15 3 days ago

Out of curiosity, why is it so expensive? Shouldn't constraining the possible result tokens make the inference less expensive? (because you have to calculate less logits and could occasionally even skip tokens entirely if there is only one valid option)

red2awn 2 days ago | parent | next [-]

Tokens are sampled from logits using the constraints after a normal forward pass. The forward pass is the expensive part of LLM inference which isn't affected by structured output.

xg15 2 days ago | parent [-]

Yes, but if the constraints only permit a single valid token anyway for some positions, you could skip the forward pass entirely for those positions and just return that token.

The other idea was a bit more theoretical: If you know only a handful tokens are valid, then calculating the logits of the other tokens in the forward pass is wasteful as they won't affect the sampling process. However, it's probably not worthe the cost to optimize that as it only affects the last layer and might be mostly amortized by SIMD parallel processing anyway.

wat10000 3 days ago | parent | prev [-]

Is there anything in the JSON grammer that only allows one valid option? In any case, I also don't understand why it would be costly. The fact that tokens are typically multiple characters would complicate things somewhat, but checking that a given token results in valid partial JSON doesn't seem too hard.

xg15 3 days ago | parent [-]

Freeform JSON not so much I think, but if you combine it with a schema and strict whitespace/formatting rules, you could get quite a few.

I think there are lots of boilerplate sequences like '":{' or '":[' or '", "', etc - though they might already be compressed into a single token if the tokenizer was trained on enough JSON.

There are also situations where the schema would only allow a specific field name as the next token, e.g. if it was the only remaining valid and required field, or if fields have to be output in a specific order.

wat10000 3 days ago | parent [-]

Good point, I momentarily forgot about the ability to specify a schema. In that case, you'd have a lot of places where there's only one possible output. Even if you have multiple fields and there are no ordering requirements, typical schemas won't take long to get to a unique field name prefix. If you've output `"us` then `ername"` is likely to be the only valid continuation in many cases.