▲ | blibble 7 days ago | |||||||||||||||||||||||||||||||
> So I simultaneously can tell you that its smart people really thinking about every facet of the problem, and I can't tell you much more than that. "we do 1970s mainframe style timesharing" there, that was easy | ||||||||||||||||||||||||||||||||
▲ | kstrauser 7 days ago | parent | next [-] | |||||||||||||||||||||||||||||||
For real. Say it takes 1 machine 5 seconds to reply, and that a machine can only possibly form 1 reply at a time (which I doubt, but for argument). If the requests were regularly spaced, and they certainly won’t be, but for the sake of argument, then 1 machine could serve 17,000 requests per day, or 120,000 per week. At that rate, you’d need about 5,600 machines to serve 700M requests. That’s a lot to me, but not to someone who owns a data center. Yes, those 700M users will issue more than 1 query per week and they won’t be evenly spaced. However, I’d bet most of those queries will take well under 1 second to answer, and I’d also bet each machine can handle more than one at a time. It’s a large problem, to be sure, but that seems tractable. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | brookst 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
But that’s not accurate. There are all sorts of tricks around KV cache where different users will have the same first X bytes because they share system prompts, caching entire inputs / outputs when the context and user data is identical, and more. Not sure if you were just joking or really believe that, but for other peoples’ sake, it’s wildly wrong. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | claytongulick 6 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I'm pretty sure that's not right. They're definitely running cluster knoppix. :-) | ||||||||||||||||||||||||||||||||
▲ | rootsudo 7 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Makes perfect sense, completely understand now! |