| ▲ | TSiege 4 hours ago | ||||||||||||||||||||||
latency absolutely matters? this is such a weird thing to say. for training sure, but customers absolutely want low latency | |||||||||||||||||||||||
| ▲ | electroly 4 hours ago | parent | next [-] | ||||||||||||||||||||||
They want it, sure. Customers want everything if it's free, but this is about what they value with their money. In this thought experiment, you're Anthropic, not the customer. You're making a choice that's best for Anthropic. Will Anthropic lose customers because the latency is higher? No way. Customers want low cost and lots of usage more than they want low latency. In a cutthroat race to the bottom, there's no room to "give away" massively expensive freebies like a data center near every population center when the customer doesn't value those extras with actual money. It's the same reason we all tolerate the relatively slow batched token generation rate--the batching dramatically lowers the cost, and we need low cost inference more than we want fast generation. If the cost goes up we'll actually leave, for real. After the initial announcement of "fast mode" in Claude Code, did you ever hear about anyone using it for real? I didn't. Vanishingly few people are willing to pay extra for faster inference. Remember that the time-to-first-token is dominated by the time to process the prompt. It's orders of magnitude more latency than the network route is adding. An extra 200 milliseconds of network delay on a 5-10 second time-to-first-token is not even noticeable; it's within the normal TTFT jitter. It would be foolish to spend billions of dollars to drop data centers around the world to reduce the 200 milliseconds when it's not going to reduce the 5-10 seconds. Skip the exotic locales and put your data centers in Cheap Power Tax Haven County, USA. Perhaps run the numbers and see if Free Cooling City, Sweden is cheaper. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | CuriouslyC 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
The only AI use case that cares about latency is interactive voice agents, where you ideally want <200ms response time, and 100ms of network latency kills that. For coding and batch job agents anything under 1s isn't going to matter to the user. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | blmarket 2 hours ago | parent | prev [-] | ||||||||||||||||||||||
Easy solution - use hyperscalers with super expensive API charge only when latency really matters. Otherwise build your own DC. Easy to expect customers don't care latency that much over money. | |||||||||||||||||||||||