| ▲ | aesthesia 3 hours ago | |
Thinking shouldn't be too hard to deal with---just let the model generate freely until it hits a </think> token, then do constrained decoding, right? | ||
| ▲ | stymaar 13 minutes ago | parent [-] | |
Sure, but does llama-cpp support that? | ||