Remix.run Logo
akie 11 hours ago

Try doing it at scale for a whole office. Not trivial.

arjunchint 11 hours ago | parent | next [-]

There are plenty of US based hosters racing to optimize and drive efficiencies

Literal race on twitter posting to increase token throughput and drive down costs on these Chinese open source models

ReptileMan 11 hours ago | parent | prev [-]

You could probably do with couple of instances. People rarely use ai 24/7, so right now you can oversubscribe and still have acceptable latency and high utilization rate.