| ▲ | cpburns2009 2 hours ago | |
That's good to know. I haven't exceeded a 120k context yet. Maybe I'll bite the bullet and try Q6 or Q8. Any of coder-next quants larger than UD-Q4_K_XL take forever to load, especially with ROCm. I think there's some sort of autotuning or fitting going in llama.cpp. | ||