Remix.run Logo
frontsideair 3 days ago

I'm interested in this, my impression was that the newer chips have unified memory and high memory bandwidth. Do you do inference on the CPU or the external GPU?

Damogran6 3 days ago | parent [-]

I don't, I'm a REALLY light user. smaller LLMs work pretty well. I used a 40gb LLM and it was _pokey_, but it worked, and switching them is pretty easy. This is a 12 core Xeon with 64Gb RAM...my M4 mini is....okay with smaller LLMs, I have a Ryzen 9 with a RTX3070ti that's the best of the bunch, but none of this holds a candle to people that spend real money to experiment in this field.