Remix.run Logo
orbital-decay 2 days ago

>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead

Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.

whatreason a day ago | parent | next [-]

There have been others for sure, but I'm not sure who was first https://vllm-website-pdzeaspbm-inferact-inc.vercel.app/blog/...

oofbey 20 hours ago | parent | prev [-]

Nobody does it because it’s expensive. If you remove the requirement for perfect reproducibility you open the door to lots of optimizations. Most people prefer faster cheaper results over perfect reproducibility. When the model is intrinsically statistical the value of perfect reproducibility is … limited.

orbital-decay 18 hours ago | parent [-]

Yeah, of course. Making it cheap/compatible with heavy batching is exactly what they did, that's what I mean. ("with minimal performance overhead")