| ▲ | mannyv 5 hours ago | |||||||||||||||||||||||||||||||
The software has real software engineers working on it instead of researchers. Remember when people were arguing about whether to use mmap? What a ridiculous argument. At some point someone will figure out how to tile the weights and the memory requirements will drop again. | ||||||||||||||||||||||||||||||||
| ▲ | snovv_crash 5 hours ago | parent [-] | |||||||||||||||||||||||||||||||
The real improvement will be when the software engineers get into the training loop. Then we can have MoE that use cache-friendly expert utilisation and maybe even learned prefetching for what the next experts will be. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||