▲ | knowaveragejoe 4 days ago | |
llama2 is pretty old. ollama also defaults to rather poor quantizations when using just the base model name like that - I believe that translates to llama2:Q_4_M which is a fairly weak quantization(fast, but you lose some smarts) My suggestion would be one of the gemma3 models: https://ollama.com/library/gemma3/tags Picking one where the size is < your VRAM(or, memory if without a dedicated GPU) is a good rule of thumb. But you can always do more with less if you get into the settings for Ollama(or other tools like it). |