| ▲ | jephs 8 hours ago | |||||||
I'm terribly sorry, but scaling curves or GTFO. Any random pile of linear algebra works fine-ish at small scales. Very few random piles of linear algebra push the Pareto envelope at large scales. | ||||||||
| ▲ | WithinReason an hour ago | parent | next [-] | |||||||
Not every one can afford millions to publish a paper | ||||||||
| ▲ | ketchup32613 7 hours ago | parent | prev [-] | |||||||
Do you want to see scaling curves wrt data and param size? I agree that 1.2B and 10B tokens is not representative, but what scale of parameters and dataset sizes would be convincing? | ||||||||
| ||||||||