It's not even hard, just slow. You could do that on a single cheap server (compared to a rack full of GPUs). Run a CPU llm inference engine and limit it to a single thread.