| ▲ | frankc 5 days ago | |
My main thought is would this allow me to speed up prompt process for large MoE models? That is the real bottleneck for m3ultra. The tokens per second is pretty good. | ||
| ▲ | embedding-shape 5 days ago | parent [-] | |
tinygrad does have pretty neat support for sharding things across various devices relatively easy, that'd help. I'm guessing you'd hit the bandwidth ceiling transferring stuff back and forth though instead. | ||