| ▲ | reisse 7 hours ago | |||||||
Nothing special? I mean, inference engine might need to get some tweaks, to support whatever compute is available. But then, if you put a few terabytes of disk for swap, and replace RAM to bigger sticks if possible, it should work? Slowly, of course, but there is no reason it should not to. | ||||||||
| ▲ | reverius42 6 hours ago | parent [-] | |||||||
The big difference will be measuring seconds per token instead of tokens per second. | ||||||||
| ||||||||