▲ | cavisne 3 days ago | |
The problem is the hardware not the software, and specifically not CUDA. Triton for example writes PTX directly (a level below CUDA). Trying to copy Nvidia hardware exactly means you will always be a generation behind, so they are forced to try and guess what different direction to take that will be useful. So far those guesses haven't worked out (not surprising as they have no specific ML expertise and are not partnered with any frontier lab), and no amount of papering over with software will help. That said I'm hopeful the rise of reasoning models can help, no one wants to bet the farm on their untested clusters but buying some chips for inference is much safer. |