| ▲ | manmal 4 hours ago | |
We've already established in this thread that memory bandwidth isn't that much greater than M4 Max - 12%? However, I wonder if batched inference will benefit greatly from the vastly improved compute. My guess is that parallel usage of the same model will be a couple times faster. So, single "threaded" use not that much better, but say you want to run a lot of batch jobs, it'd be way faster? | ||
| ▲ | andy_ppp an hour ago | parent [-] | |
Is this a reply to a different comment? | ||