Remix clone Hacker News

new | show | ask | jobs Github

	▲	rprend 2 hours ago
		This is not true. API tokens are not sold at a loss, and hardware gets more efficient over time, so serving inference on the same model gets cheaper. LLAMA 3.1 405B parameters was $6/$12/M tokens in 2024, but in 2026 that same model is $3/$3/M tokens. The most intelligent model at a given time is much larger than the previous, which is why token costs for GPT5.5 are higher than 5.4. But you should expect that 2 years from now, serving a GPT5.5 sized model will be cheaper than GPT5.5 today. You should expect it to be even cheaper to get an equally intelligent model 2 years from now, because distillation techniques are effective at reducing the necessary parameter count for the same benchmark scores.