| ▲ | simonw 2 days ago | |||||||||||||||||||||||||
There have been some very interesting experiments with streaming from SSD recently: https://simonwillison.net/2026/Mar/18/llm-in-a-flash/ | ||||||||||||||||||||||||||
| ▲ | EnPissant 2 days ago | parent [-] | |||||||||||||||||||||||||
I don't mean to be a jerk, but 2-bit quant, reducing experts from 10 to 4, who knows if the test is running long enough for the SSD to thermal throttle, and still only getting 5.5 tokens/s does not sound useful to me. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||