| ▲ | jiggawatts 3 hours ago | |
Yes. Even the latest NVIDIA Blackwell GPUs are general purpose, albeit with negligible "graphics" capabilites. They can run fairly arbitrary C/C++ code with only some limitations, and the area of the chip dedicated to matrix products (the "tensor units") is relatively small: less than 20% of the area! Conversely, the Google TPUs dedicate a large area of each chip to pure tensor ops, hence the name. This is partly why Google's Gemini is 4x cheaper than OpenAI's GPT5 models to serve. Jensen Huang has said in recent interviews that he stands by the decision to keep the NVIDIA GPUs more general purpose, because this makes them flexible and able to be adapted to future AI designs, not just the current architectures. That may or may not pan out. I strongly suspect that the winning chip architecture will have about 80% of its area dedicated to tensor units, very little onboard cache, and model weights streamed in from High Bandwidth Flash (HBF). This would be dramatically lower power and cost compared to the current hardware that's typically used. Something to consider is that as the size of matrices scales up in a model, the compute needed to perform matrix multiplications goes up as the cube of their size, but the other miscellaneous operations such as softmax, relu, etc.. scale up linearly with the size of the vectors being multiplied. Hence, as models scale into the trillions of parameters, the matrix multiplications ("tensor" ops) dominate everything else. | ||