| ▲ | dvt 15 hours ago | |||||||
What excites me most about these new 4figure/second token models is that you can essentially do multi-shot prompting (+ nudging) and the user doesn't even feel it, potentially fixing some of the weird hallucinatory/non-deterministic behavior we sometimes end up with. | ||||||||
| ▲ | volodia 13 hours ago | parent | next [-] | |||||||
That is also our view! We see Mercury 2 as enabling very fast iteration for agentic tasks. A single shot at a problem might be less accurate, but because the model has a shorter execution time, it enables users to iterate much more quickly. | ||||||||
| ▲ | lostmsu 11 hours ago | parent | prev [-] | |||||||
Regular models are very fast if you do batch inference. GPT-OSS 20B gets close to 2k tok/s on a single 3090 at bs=64 (might be misremembering details here). | ||||||||
| ||||||||