| ▲ | 2ndorderthought 3 hours ago | |||||||||||||||||||||||||||||||||||||||||||
The little qwen36 is at sonnet level . Kimi2.6 is about opus. The one can run on a single GPU on your gaming pc. The other you can run way cheaper from a provider. Or if you are really wealthy and have lots of gpus can run it yourself. Not sure where deepseek 4 sits | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | vidarh 2 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Kimi 2.6 is nowhere near even Sonnet in overall robustness. It can get close when everything goes perfectly. I have about 1KLOC of harness code written by Kimi to work around quirks in Kimi not needed for any other model I've tested, such as infinite toolcall loops and other weirdness. You can do quite a bit with it and never run into those quirks, or you might hit it every request. It is very sensitive to "confusing" things about it's environment in a way Sonnet and Opus are not. Still great value, but they have some way to go. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ryandrake 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Would "lots of gpus" even help for huge models? Maybe this is exposing my lack of knowledge but don't you need to keep the whole model and context in a single GPU's VRAM? My understanding is that multiple GPUs help with scaling (can handle N X inference requests simultaneously) but it doesn't help with using large models. If that were the case, I could jam another GPU in my box and double the size of model I can serve. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Jabrov 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Yes multiple GPUs absolutely help with inference even for a single model instance. Some models are simply too big to fit on the largest available GPU. Check out tensor parallelism | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ffsm8 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
Please don't oversell them. Eg Kimi k2.6 has a maximum context size of 270k, that's a quarter of opus. The model is fine, Ive switched to it entirely for a personal project, but it's not opus. And no, you're not running then locally unless you're a millionaire. You still need hundreds of GB (500+++) of VRAM on your graphics card - that's not at a level of consumer electronics. Sure you can run the quantized models, but then you're at Haiku performance. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||