| ▲ | nico 3 days ago | |||||||
Has anyone done something like this but with apple silicon instead of a graphics card? Training a small LLM on an M2-M5? | ||||||||
| ▲ | muricula 2 days ago | parent | next [-] | |||||||
I've played with something similar with my M1 using Apple's MLX framework. The problem is I'm compute bound. I've never managed to get my M1 Max's GPU to process more than ~7.8k tokens per second at bf16 precision, so to train a 112M parameter model on ~20 billion tokens I'd need to run the model training for ~30 days. One solution is to reduce the scope of the problem -- you can train on a smaller less diverse dataset such as TinyStories which is a collection of 1 billion tokens of chatGPT generated children's stories. After about 40 hours, less than one weekend, you'll have a model which can generate mostly grammatical children's stories. If you have a newer mac and/or an ultra chip you'll have more and faster GPU cores, and might be able to train on FineWeb or a similar, larger and more diverse dataset. | ||||||||
| ||||||||
| ▲ | goosers 2 days ago | parent | prev [-] | |||||||
I’m experimenting with this, but using the CPU not the GPU. I’m finishing up writing the series now, but focused more on understanding the architecture than trying to build a useful model. Mine requires talking in the language of Shakespeare, and getting replies in the same, a proof of concept more than a useful tool. https://www.tag1.com/white-paper/part1-tokenization-building... I was interested in focusing on repeatability and using text sources anyone can legally obtain. It’s been fascinating, but after much experimentation it’s clear that working with more text and more diverse text would be extremely helpful. | ||||||||