Remix.run Logo
menaerus an hour ago

2-bit quantization? That's a lot of signal being removed. Considering how quickly the AI models are progressing in their capabilities (still exponential curve), I will not want to use the 2025 model in two years time. Similarly, how I don't want to use llama-3 or old Anthropic model from 2023 or 2024. Newer models are so much better that it makes it very difficult to ignore.

Once and if the advancements with the AI models slow down, only then IMHO it will become feasible to design the specialized HW for general-purpose consumption and general-purpose workloads.

nmfisher 23 minutes ago | parent [-]

Opus 4.6 was a 2025 model and many people (myself included) feel that if that's where models peaked, we won't be disappointed.

Even at 2-bit quantization, DS4 is probably on par with a 2024 frontier model. You can run that today on local hardware, and at a minimum, local models are going to keep pace over the next 12-24 months. Even if they don't close the gap with frontier models, they'll still play an important role in the overall pipeline for cost, speed and privacy reasons.

That's without even mentioning the additional capability that something like a Taalas chip churning out 17k tokens/sec could unlock.