| ▲ | behnamoh 7 hours ago | |
> That said, faster inference can't come soon enough. why is that? technical limits? I know cerebras struggles with compute and they stopped their coding plan (sold out!). their arch also hasn't been used with large models like gpt-5.2. the largest they support (if not quantized) is glm 4.7 which is <500B params. | ||