| ▲ | reissbaker 2 hours ago | |||||||
I'm biased because I run an inference company, https://synthetic.new. That being said I think we're pretty good at serving at GLM-5.2 — and other models, like Kimi K2.7! — and our privacy policy is quite good: zero data retention for prompts and completions on API requests. Our average streaming TPS for GLM-5.2 (aka, tokens after factoring out time-to-first-token, which varies based on geography) is 97tps over the last 24hrs, although it's slightly lower at peak traffic in the mornings PST where it's 50-70 tps. We're also subscription-based which is nicer for coding than e.g. Fireworks which is per-token billing. | ||||||||
| ▲ | yieldcrv 2 hours ago | parent [-] | |||||||
got a 500 error page on the site's chat, but I'll try the API | ||||||||
| ||||||||