▲ | eldenring 3 days ago | |||||||||||||||||||
Serving a model efficiently at 1M context is difficult and could be much more expensive/numerically tricky. I'm guessing they were working on serving it properly, since its the same "model" in scores and such. | ||||||||||||||||||||
▲ | simianwords 3 days ago | parent [-] | |||||||||||||||||||
Thanks - still not clear what they did really. Some inference time hacks? | ||||||||||||||||||||
|