| ▲ | sbierwagen 16 hours ago | |
From the author's writeup: >the final pre-trained model came out to about 340 million parameters, and had a final validation bpb of 0.973. The pretraining process took about five hours on-chip, and cost maybe $35. I had my pretrained model, trained in 6496 steps. Things were proceeding swiftly, and cheaply! GPT-3 had 175,000 million parameters. The smallest of the Gemma 4 models released today clock in at 5,000 million parameters, and I would bet that Google trained them for more than five hours. Just too small and not trained for enough time. A fun art project but not a functional LLM. | ||