Remix.run Logo
spagettnet 5 days ago

depends ln your goals of course. but worth mentioning there are plenty of narrowish tasks (think text-to-sql, and other less general language tasks) where llama8b or phi-4 (14b) or even up to 30b with quantization can be trained on 8xa100 with great results. plus these smaller models benefit from being able to be served on a single a100 or even L4 with post training quantization, with wicked fast generation thanks to the lighter model.

on a related note, at what point are people going to get tired of waiting 20s for an llm to answer their questions? i wish it were more common for smaller models to be used when sufficient.