| ▲ | mfro 11 hours ago | |||||||
Strangely, it is super fast on my 16 Plus, but with longer messages it can slow down a LOT, and not because of thermal throttling. I wish I could see some diagnostic data. | ||||||||
| ▲ | steve-atx-7600 11 hours ago | parent [-] | |||||||
Inference from an LLM is O(tokens^2) | ||||||||
| ||||||||