| ▲ | fnbr 11 hours ago |
| (I’m a researcher on the post-training team at Ai2.) 7B models are mostly useful for local use on consumer GPUs. 32B could be used for a lot of applications. There’s a lot of companies using fine tuned Qwen 3 models that might want to switch to Olmo now that we have released a 32B base model. |
|
| ▲ | kurthr 2 hours ago | parent | next [-] |
| Are there quantized (eg 4bit) models available yet? I assume the training was done in BF16, but it seems like most inference models are distributed in BF8 until they're quantized. edit ahh I see it on huggingface:
https://huggingface.co/mlx-community/Olmo-3-1125-32B-4bit |
|
| ▲ | littlestymaar 11 hours ago | parent | prev [-] |
| May I ask why you went for a 7B and a 32B dense models instead of a small MoE like Qwen3-30B-A3B or gpt-oss-20b given how successful these MoE experiments were? |
| |
| ▲ | fnbr 10 hours ago | parent | next [-] | | MoEs have a lot of technical complexity and aren't well supported in the open source world. We plan to release a MoE soon(ish). I do think that MoEs are clearly the future. I think we will release more MoEs moving forward once we have the tech in place to do so efficiently. For all use cases except local usage, I think that MoEs are clearly superior to dense models. | | |
| ▲ | trebligdivad 2 hours ago | parent [-] | | Even local, MoE are just so much faster, and they let you pick a large/less quantized model and still get a useful speed. |
| |
| ▲ | riazrizvi 10 hours ago | parent | prev [-] | | 7B runs on my Intel Macbook Pro - there is a broad practical application served here for developers who need to figure out a project on their own hardware, which improves time/cost/effort economy. Before committing to a bigger model for the same project. |
|