| ▲ | raincole 7 hours ago | |
When it comes to LLM you really cannot draw conclusions from first principles like this. Yes, it sounds reasonable. And things in reality aren't always reasonable. Benchmark or nothing. | ||
| ▲ | samus 7 hours ago | parent [-] | |
There have been papers about introducing thinking tokens in intermediary layers that get stripped from the output. | ||