| ▲ | storystarling 5 hours ago | |
I suspect the bottleneck on 12+ year old hardware wouldn't be power but the interconnects. SOTA training is bound by gradient synchronization latency. Without NVLink you hit a hard wall where the compute spends most of its time waiting on PCIe or ethernet. | ||
| ▲ | fc417fc802 4 hours ago | parent [-] | |
Fair point. Though if this were actually attempted I imagine it would start with making changes to the model architecture, the physical hardware, or both. My hypothetical is probably somewhat over the top given that isn't China somewhere in the vicinity of 7 nm at present? | ||