| ▲ | chrislattner 2 days ago |
| If you want the fastest open source implementation on Blackwell and AMD MI355, check out Modular's MAX nightly. You can pip install it super fast, check it out here:
https://www.modular.com/blog/day-zero-launch-fastest-perform... -Chris Lattner (yes, affiliated with Modular :-) |
|
| ▲ | nabakin 2 days ago | parent | next [-] |
| Faster than TensorRT-LLM on Blackwell? Or do you not consider TensorRT-LLM open source because some dependencies are closed source? |
| |
| ▲ | melodyogonna 2 days ago | parent [-] | | I reviewed the TensorRT-LLM commit history from the past few days and couldn't find any updates regarding Gemma 4 support. By contrast, here is the reference for MAX:https://github.com/modular/modular/commit/57728b23befed8f3b4... | | |
| ▲ | nabakin 2 days ago | parent [-] | | If OP meant they have the fastest implementation of Gemma 4 on Blackwell at the moment, I guess that is technically true. I doubt that will hold up when TensorRT-LLM finishes their implementation though. | | |
| ▲ | pama 2 days ago | parent [-] | | How is the sglang performance on Blackwell for this model? | | |
| ▲ | nabakin 2 days ago | parent [-] | | Dunno but there's a PR for it. Probably also more performant than Modular. |
|
|
|
|
|
| ▲ | jjcm 2 days ago | parent | prev [-] |
| What % of a speedup should I be expecting vs just running this the standard pytorch approach? |