| ▲ | Aurornis 5 hours ago | ||||||||||||||||
> Cloud hardware is not inherently more "proper" than what's being proposed here Cloud hardware can run the original model. Quantization will reduce quality. The quality drop to Q4 is not trivial. Cloud hardware is also massively faster in time to first token and token generation speed. > there's nothing wrong per se about targeting slower inference speeds in a local single-user context. If that's what the user wants and expects then it's fine Most people working interactively with an LLM would suffer from slower turns. | |||||||||||||||||
| ▲ | zozbot234 4 hours ago | parent [-] | ||||||||||||||||
> Cloud hardware can run the original model. Quantization will reduce quality. New models are often being released in quantized format to begin with. This is true of both Kimi and the new DeepSeek V4 series. There is no "original model", the model is generated using Quantization Aware Training (QAT). | |||||||||||||||||
| |||||||||||||||||