▲ | pama 2 days ago | |
This sounds very reasonable to me. When considering top tier labs that optimize inference and own the GPUs: the electricity cost of USD 5000 at a data center with 4 cents per kWh (which may be possible to arrange or beat in some counties in the US with special industrial contracts) can produce about 2 trillion tokens for the R1-0528 model using 120kW draw for the B200 NVL72 hardware and the (still to be fully optimized) sglang inference pipeline: https://lmsys.org/blog/2025-06-16-gb200-part-1/ Although 2T tokens is not unreasonable for being able to get high precision answers to challenging math questions, such a very high token number would strongly suggest there are lots of unknown techniques deployed at these labs. If one adds the cost of GPU ownership or rental, say 2 USD/h/GPU, then the number of tokens for 5k USD shrinks dramatically to only 66B tokens, which is still high for usual techniques that try to optimize for a best single answer in the end, but perhaps plausible if the vast majority of these are intermediate thinking tokens and a lot of the value comes from LLM-based verification. |