Remix.run Logo
samsartor 4 days ago

And remember, optimization problems can be _incredibly_ high-dimensional. A 7B parameter LLM is a 7-billion-dimensional optimization landscape. A grid-search with a resolution of 10 (ie 10 samples for each dimension) would requre evaluating the loss function 10^(7*10^9) times. That is, the number of evaluations is a number with 7B digits.