▲ | philipkiely 8 days ago | |
Yeah the custom hardware providers are super good at TPS. Kudos to their teams for sure, and the demos of instant reasoning are incredibly impressive. That said, we are serving the model at its full 131K context window, and they are serving 33K max, which could matter for some edge case prompts. Additionally, NVIDIA hardware is much more widely available if you are scaling a high-traffic application. |