Nothing special?

I mean, inference engine might need to get some tweaks, to support whatever compute is available. But then, if you put a few terabytes of disk for swap, and replace RAM to bigger sticks if possible, it should work? Slowly, of course, but there is no reason it should not to.

▲

reverius42 6 hours ago | parent [-]

The big difference will be measuring seconds per token instead of tokens per second.

	▲	martijnvds 3 hours ago \| parent [-]
		Seconds per token is just fractional tokens per second ;)