Remix.run Logo
amazingamazing 10 hours ago

More benchmaxxing I see. Too bad there’s no rig with 256gb unified ram for under $1000

cpburns2009 5 hours ago | parent | next [-]

Sir, this is 2026. You're not getting any 128GB of RAM for under $1k.

kennethops 10 hours ago | parent | prev | next [-]

do you know if they did this to it?

https://research.google/blog/turboquant-redefining-ai-effici...

kgeist 10 hours ago | parent [-]

Llama.cpp already uses an idea from it internally for the KV cache [0]

So a quantized KV cache now must see less degradation

[0] https://github.com/ggml-org/llama.cpp/pull/21038

bigyabai 9 hours ago | parent | prev [-]

taps the sign

  Unified Memory Is A Marketing Gimmeck. Industrial-Scale Inference Servers Do Not Use It.
rcxdude 39 minutes ago | parent | next [-]

Unified Memory is mainly how consumer hardware has enough RAM accessible by the GPU to run larger models, because otherwise the market segmentation jacks up the price substantially.

bigyabai 7 minutes ago | parent [-]

UMA removes the PCIe bottleneck and replaces it with a memory controller + bandwidth bottleneck. For most high-performance GPUs, that would be a direct downgrade.

zozbot234 9 hours ago | parent | prev [-]

Industrial Scale Inference is moving towards LPDDR memory (alongside HBM), which is essentially what "Unified Memory" is.

0x457 5 hours ago | parent | next [-]

> which is essentially what "Unified Memory" is.

Unified memory is when CPU and GPU can reference the same memory address without things being copied (CUDA allows you to write code as if it was unified even if it's not, so that doesn't count, but HMM does count[1])

That is all. What technology is underneath is hardware detail. Unified memory on macs lets you put something into a memory, then do some computation on it with CPU, ANE, ANA, Metal Shaders. All without copying anything.

DGX Spark also has unified memory.

[1]: https://docs.nvidia.com/cuda/cuda-programming-guide/02-basic...

bigyabai 9 hours ago | parent | prev [-]

LPDDR is LPDDR. There's nothing "unified" about it architecturally.