| ▲ | mrob 6 hours ago | |
LLM inference is mostly read only, so high-bandwidth flash looks like it could provide huge cost savings over VRAM. It's not yet in commercial products but there are working prototypes already. Previous HN discussion: | ||
| ▲ | whosegotit 4 hours ago | parent [-] | |
[dead] | ||