Remix.run Logo
orbital-decay 2 hours ago

They do, though. Providers don't because batching makes it cheaper. Among the providers, DeepSeek seems to support it for v4 (and have actually optimized their kernels for batching), and Gemini Flash is "almost deterministic".

danpalmer 42 minutes ago | parent [-]

I'm pretty sure that the determinism issue is at the floating point math level, or even the hardware level. Just disabling batching and reducing the temperature to 0 does not result in truly deterministic answers.

orbital-decay 26 minutes ago | parent [-]

FP math itself is deterministic on real hardware, if the order of operations stays the same. Output reproducibility is much less of a problem than it seems, see for example https://docs.vllm.ai/en/latest/usage/reproducibility/