| ▲ | inkysigma 9 hours ago | |
What makes you think that the entire process isn't being made more efficient? There are entire papers dedicated to pulling out more FLOPs from GPUs so that less energy is being wasted on simply moving memory around. Of course, there's also inference side optimizations like speculative decoding and MoE. Some of these make the training process more expensive. The other big problem is that you can always increase the scale to compensate for the energy efficiency. I do wonder if they'll eventually level this off though. If performance somehow plateaus then presumably the efficiency gains will catch up. That being said, that doesn't seem to be a thing in the near future. | ||