People have tried to run Qwen3-235B-A22B-Thinking-2507 on 4x $600 used, Nvidia 3090s with 24 GB of VRAM each (96 GB total), and while it runs, it is too slow for production grade (<8 tokens/second). So we're already at $2400 before you've purchased system memory and CPU; and it is too slow for a "Sonnet equivalent" setup yet...
You can quantize it of course, but if the idea is "as close to Sonnet as possible," then while quantized models are objectively more efficient they are sacrificing precision for it.
So next step is to up that speed, so we're at 4x $1300, Nvidia 5090s with 32 GB of VRAM each (128 GB), or $5,200 before RAM/CPU/etc. All of this additional cost to increase your tokens/second without lobotomizing the model. This still may not be enough.
I guess my point is: You see this conversation a LOT online. "Qwen3 can be near Sonnet!" but then when asked how, instead of giving you an answer for the true "near Sonnet" model per benchmarks, they suddenly start talking about a substantially inferior Qwen3 model that is cheap to run at home (e.g. 27B/30B quantized down to Q4/Q5).
The local models absolutely DO exist that are "near Sonnet." The hardware to actually run them is the bottleneck, and it is a HUGE financial/practical bottleneck. If you had a $10K all-in budget, it isn't actually insane for this class of model, and the sky really is the limit (again to reduce quantization and or increase tokens/second).
PS - And electricity costs are non-trivial for 4x 3090s or 4x 5090s.