| ▲ | devashish86 3 days ago | |||||||
Author here. Quick context the post doesn't quite spell out: The tool_choice="auto" failure on Qwen3-Next isn't a parser issue — the model reasons inside <think>, decides, and never emits the tool call. No error, just empty tool_calls. The fix was swapping the backbone from Thinking to Instruct, not tuning any parser flag. The "load the bigger model first, size the smaller against actual residency" playbook generalizes to anything with shared CUDA framework overhead. The ~5 GiB framework floor shows up even at small gpu_memory_utilization values — plan against actuals, not targets. | ||||||||
| ▲ | edg5000 3 hours ago | parent | next [-] | |||||||
From the Codex system prompt (verbatim): ``` (...) - Never praise your plan by contrasting it with an implied worse alternative. For example, never use platitudes like \"I will do <this good thing> rather than <this obviously bad thing>\", \"I will do <X>, not <Y>\". - Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. (...) ``` It seems the OpenAI people added that first bullet to specifically address the tendency the model has, as seen in the parent comment. The goblin stuff coincidentally appears right after in the system prompt, so in included it as a bonus. | ||||||||
| ||||||||
| ▲ | barrkel 5 hours ago | parent | prev [-] | |||||||
Can you try and tune your Claude or whatever LLM you're using for your text to phrase things in plain English. Way less use of antithesis, at least. You can probably find a skill for it, if not get an LLM to write your own. | ||||||||