▲ | kelsey98765431 5 days ago | |
FYI it also supports pre-training, reward model training and RL, not just fine tuning (sft). My team built a managed solution for training that runs on top of llama factory and it's quite excellent and well supported. You will need pretty serious equipment to get good results out of it, think 8xh200. For people at home i would look at doing an sft of gemma3 270m or maybe a 1.6b qwen3, but keep in mind you have to have the dataset in memory as well as the model and kv-cache. cheers | ||
▲ | spagettnet 5 days ago | parent | next [-] | |
depends ln your goals of course. but worth mentioning there are plenty of narrowish tasks (think text-to-sql, and other less general language tasks) where llama8b or phi-4 (14b) or even up to 30b with quantization can be trained on 8xa100 with great results. plus these smaller models benefit from being able to be served on a single a100 or even L4 with post training quantization, with wicked fast generation thanks to the lighter model. on a related note, at what point are people going to get tired of waiting 20s for an llm to answer their questions? i wish it were more common for smaller models to be used when sufficient. | ||
▲ | zwaps 3 days ago | parent | prev [-] | |
Why do you have to keep the dataset in memory? We had good distributed streaming datasets for a good while now, no? |