| ▲ | 190n 2 hours ago | |
> *2-bit quantization produces \name\ instead of "name" in JSON output, making tool calling unreliable. I was wondering about that statement. Shouldn't it restrict sampling to only tokens that produce valid JSON matching the schema during a tool call? On the other hand, I have heard a lot about how even production LLM providers don't always call tools accurately, so I suppose either it's hard to implement what I described or there's something I haven't thought of that makes it impossible. | ||