▲ | zackangelo 8 days ago | |
GPT-OSS will run even faster on Blackwell chips because of its hardware support for fp4. If anyone is working on training or inference in Rust, I'm currently working on adding fp8 and fp4 support to cudarc[0] and candle[1]. This is being done so I can support these models in our inference engine for Mixlayer[2]. [0] https://github.com/coreylowman/cudarc/pull/449 [1] https://github.com/huggingface/candle/pull/2989 [2] https://mixlayer.com | ||
▲ | diggan 8 days ago | parent [-] | |
Ah, interesting. As someone with a RTX Pro 6000, is it ready today to be able to run gpt-oss-120b inference, or are there still missing pieces? Both linked PRs seems merged already, so unsure if it's ready to be played around with or not. |