| ▲ | froh an hour ago | |
> GPUs are extremely underutilized if you launch just 1 generation stream why is that? b/c the thing is waiting for the hoooman and idling? or some parallelizable interleaving steps? I have no intuition yet how this works under the hood. | ||