| ▲ | hu3 16 hours ago | |
> only a small fraction will be needed in VRAM at any given time I don't think that's true. At least not without heavy performance loss in which case "just be memory mapped" is doing a lot of work here. By that logic GPUs could run models much larger than their VRAM would otherwise allow, which doesn't seem to be the case unless heavy quantization is involved. | ||
| ▲ | zozbot234 15 hours ago | parent [-] | |
Existing GPU API's are sadly not conducive to this kind of memory mapping with automated swap-in. The closest thing you get AIUI is "sparse" allocations in VRAM, such that only a small fraction of your "virtual address space" equivalent is mapped to real data, and the mapping can be dynamic. | ||