Remix clone Hacker News

new | show | ask | jobs Github

	▲	adrian_b 3 hours ago
		As it has been discussed in a few recent threads on HN, whenever a new model is released, running it successfully may need changes in the inference backends, such as llama.cpp. There are 2 main reasons. One is the tokenizer, where new tokenizer definitions may be mishandled by the older tokenizer parsers. The second reason is that each model may implement differently the tool invocations, e.g. by using different delimiter tokens and different text layouts for describing the parameters of a tool invocation. Therefore running the Gemma-4 models encountered various problems during the first days after their release, especially for the dense 31B model. Solving these problems required both a new version of llama.cpp (also for other inference backends) and updates in the model chat template and tokenizer configuration files. So anyone who wants to use Gemma-4 should update to the latest version of llama.cpp and to the latest models from Huggingface, because the latest updates have been a couple of days ago.