| ▲ | coder543 2 days ago |
| There are issues with the chat template right now[0], so tool calling does not work reliably[1]. Every time people try to rush to judge open models on launch day... it never goes well. There are ~always bugs on launch day. [0]: https://github.com/ggml-org/llama.cpp/pull/21326 [1]: https://github.com/ggml-org/llama.cpp/issues/21316 |
|
| ▲ | stavros a day ago | parent | next [-] |
| What causes these? Given how simple the LLM interface is (just completion), why don't teams make a simple, standardized template available with their model release so the inference engine can just read it and work properly? Can someone explain the difficulty with that? |
| |
| ▲ | Yukonv a day ago | parent [-] | | The model does have the format specified but there is no _one_ standard. For this model it’s defined in the [
tokenizer_config.json [0]. As for llama.cpp they seem to be using a more type safe approach to reading the arguments. [0] https://huggingface.co/google/gemma-4-31B-it/blob/main/token... | | |
| ▲ | stavros a day ago | parent [-] | | Hm, but surely there will be converters for such simple formats? I'm confused as to how there can be calling bugs when the model already includes the template. |
|
|
|
| ▲ | emidoots 2 days ago | parent | prev [-] |
| was just merged |
| |
| ▲ | coder543 2 days ago | parent [-] | | It was just an example of a bug, not that it was the only bug. I’ve personally reported at least one other for Gemma 4 on llama.cpp already. In a few days, I imagine that Gemma 4 support should be in better shape. |
|