| ▲ | adrian_b an hour ago | |
In a computer with 2 PCIe 5.0 SSDs or one with a PCIe 5.0 SSDs and a PCIe 4.0 SSD, it should be possible to stream weights from the SSDs at 20 GB/s, or even more. This is not a little faster, but 10 times faster than on your system. So a couple of tokens per second generation speed should be achievable. Nowadays even many NUCs or NUC-like mini-PCs have such SSD slots. I have actually started working at optimizing such an inference system, so your data is helpful for comparison. | ||