| ▲ | otterley 7 hours ago | |
I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re specifically referring to time to first token. So it’s not about token generation rate (tokens per second). | ||
| ▲ | aurareturn 6 hours ago | parent | next [-] | |
Yes. This is known. They added neural accelerators, aka Tensor core equivalent, in the GPU. This will make prompt processing competitive vs similar class GPUs. | ||
| ▲ | jasonjmcghee 7 hours ago | parent | prev [-] | |
It would probably be worth finding a more friendly way to market this, but it's a reasonable / accurate way to say it. The prompt processing sped up. Not the output generation. M4 was notoriously slow at this compared to DGX etc. | ||