▲ | varispeed 5 days ago | |
So would 40x RPi 5 get 130 token/s? | ||
▲ | SillyUsername 5 days ago | parent | next [-] | |
I imagine it might be limited by number of layers and you'll get diminishing returns as well at some point caused by network latency. | ||
▲ | reilly3000 5 days ago | parent | prev | next [-] | |
It has to be 2^n nodes and limited to one per attention head that the model has. | ||
▲ | VHRanger 5 days ago | parent | prev [-] | |
Most likely not because of NUMA bottlenecks |