Remix.run Logo
sbierwagen 16 hours ago

From the author's writeup:

>the final pre-trained model came out to about 340 million parameters, and had a final validation bpb of 0.973. The pretraining process took about five hours on-chip, and cost maybe $35. I had my pretrained model, trained in 6496 steps. Things were proceeding swiftly, and cheaply!

GPT-3 had 175,000 million parameters. The smallest of the Gemma 4 models released today clock in at 5,000 million parameters, and I would bet that Google trained them for more than five hours. Just too small and not trained for enough time. A fun art project but not a functional LLM.