How does the token usage compare to vanilla structured output? Many of these libraries do multiple requests to constrain output and measure logprobs.