| ▲ | lambda 3 hours ago | |
Yeah, I looked up some models I have actually run locally on my Strix Halo laptop, and its saying I should have much lower performance than I actually have on models I've tested. For MoE models, it should be using the active parameters in memory bandwidth computation, not the total parameters. | ||