Remix clone Hacker News

new | show | ask | jobs Github

	▲	campers 4 days ago
		I had wondered if they run their inference at high batch sizes to get better throughput to keep their inference costs lower. They do have a priority tier at double the cost, but haven't seen any benchmarks on how much faster that actually is. The flex tier was an underrated feature in GPT5, batch pricing with a regular API call. GPT5.1 using flex priority is an amazing price/intelligence tradeoff for non-latency sensitive applications, without needing to extra plumbing of most batch APIs
	▲	mips_avatar 4 days ago \| parent [-]
		I’m sure they do something like that. I’ve noticed azure has way faster gpt 4.1 than OpenAI