You can already run inference on ordinary hardware but if you want workable throughput you're limited to small models, and these have very poor world-knowledge.