Remix.run Logo
reisse 7 hours ago

Nothing special?

I mean, inference engine might need to get some tweaks, to support whatever compute is available. But then, if you put a few terabytes of disk for swap, and replace RAM to bigger sticks if possible, it should work? Slowly, of course, but there is no reason it should not to.

reverius42 6 hours ago | parent [-]

The big difference will be measuring seconds per token instead of tokens per second.

martijnvds 3 hours ago | parent [-]

Seconds per token is just fractional tokens per second ;)