Smaller models require a lot more direction, a.k.a system prompt engineering, and sometimes custom wrappers . For example Gemma models are very eager to generate code even if you tell them not to.