| ▲ | djsjajah 8 hours ago | |
> Do you really though? Yes. It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register. For every token that is generated, a dense llm has to read every parameter in the model. | ||