This is an interesting observation. So maybe it has nothing to do with the model itself, but everything to do with external configuration. Token-limit exceeded -> empty output. Just a guess, though.

▲

embedding-shape 5 hours ago | parent | next [-]

> Token-limit exceeded -> empty output. Just a guess, though.

That'd be really non-obvious behavior, I'm not aware of any inference engine that works like that by default, usually you'd get everything up until the limit, otherwise that kind of breaks the whole expectation about setting a token-limit in the first place...

▲

GrinningFool 3 hours ago | parent | next [-]

I just fixed this bug in a summarizer. Reasoning tokens were consuming the budget I gave it (1k), so there was only a blank response. (Qwen3.5-35B-A3B)

▲

embedding-shape 3 hours ago | parent [-]

Most inference engines would return the reasoning tokens though, wouldn't you see that the reasoning_content (or whatever your engine calls it) was filled while content wasn't?

	▲	GrinningFool 2 hours ago \| parent [-]
		Yeah, I had been ignoring the reasoning tokens for the summarize call

▲

qayxc 5 hours ago | parent | prev [-]

This doesn't necessarily relate to the inference itself. No models are exposed to input directly when using web-based APIs, there's pre-processing layers involved that do undocumented stuff in opaque ways.

▲

5 hours ago | parent | prev [-]

[deleted]