▲ | accrual 16 hours ago | |
I wondered similar. Perhaps a local model cached in a 16GB or 24GB graphics card would perform well too. It would have to be a quantized/distilled model, but maybe sufficient, especially with some additional training as you mentioned. | ||
▲ | jszymborski 16 hours ago | parent | next [-] | |
If Qwen 0.6B is suitable, then it could fit in 576MB of VRAM[0]. | ||
▲ | otabdeveloper4 14 hours ago | parent | prev [-] | |
16Gb is way overkill for this. |