▲ | ks2048 4 days ago | ||||||||||||||||||||||||||||||||||
When you look at a 2D surface, you directly observe all the values on that surface. For a loss-function, the value at each point must be computed. You can compute them all and "look at" the surface and just directly choose the lowest - that is called a grid search. For high dimensions, there's just way too many "points" to compute. | |||||||||||||||||||||||||||||||||||
▲ | samsartor 4 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
And remember, optimization problems can be _incredibly_ high-dimensional. A 7B parameter LLM is a 7-billion-dimensional optimization landscape. A grid-search with a resolution of 10 (ie 10 samples for each dimension) would requre evaluating the loss function 10^(7*10^9) times. That is, the number of evaluations is a number with 7B digits. | |||||||||||||||||||||||||||||||||||
▲ | cubefox 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
What about sampling at low resolution? If the hills and valleys aren't too close together, this should give a good indication of where the global minimum is. | |||||||||||||||||||||||||||||||||||
|