I'm biased because I run an inference company, https://synthetic.new. That being said I think we're pretty good at serving at GLM-5.2 — and other models, like Kimi K2.7! — and our privacy policy is quite good: zero data retention for prompts and completions on API requests. Our average streaming TPS for GLM-5.2 (aka, tokens after factoring out time-to-first-token, which varies based on geography) is 97tps over the last 24hrs, although it's slightly lower at peak traffic in the mornings PST where it's 50-70 tps. We're also subscription-based which is nicer for coding than e.g. Fireworks which is per-token billing.

▲

yieldcrv 2 hours ago | parent [-]

got a 500 error page on the site's chat, but I'll try the API

	▲	reissbaker an hour ago \| parent [-]
		Interesting: I don't see anything in our error logs but we could be missing something (and personally the chat works for me + my unsubscribed test account). If you email us at hi@synthetic.new though we should be able to fix anything you're running into!