Remix.run Logo
MuffinFlavored 7 hours ago

> Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s.

> deepseek-v3.2-685b, $40/mo/slot for ~20 tok/s, 465 slots total

> 465 users × 20 tok/s = 9,300 tok/s needed

> The node peaks at ~3,000 tok/s total. So at full capacity they can really only serve:

> 3,000 ÷ 20 = 150 concurrent users at 20 tok/s

> That's only 32% of the cohort being active simultaneously.

artificialprint 7 hours ago | parent [-]

People work 8 hours a day presumably, I guess they are banking on this idea

ycui1986 a minute ago | parent [-]

only works if the users are evenly distributed around the globe (which is likely more of less the case). if the user concentrates in on century, the token rate will be terrible.