Remix.run Logo
jpcompartir 15 hours ago

There are better techniques for hyper-parameter optimisation, right? I fear I have missed something important, why has Autoresearch blown up so much?

The bottleneck in AI/ML/DL is always data (volume & quality) or compute.

Does/can Autoresearch help improve large-scale datasets? Is it more compute efficien than humans?

bonoboTP 14 hours ago | parent | next [-]

There is a field of AutoML, with its own specialized academic literature and libraries that tried to achieve this type of thing but didn't work very well in practice.

Years ago there were big hopes about bayesian hyperparameter optimization, predicting performance with Gaussian processes etc, hyperopt library, but it was often starting wasteful experiments because it really didn't have any idea what the parameters did. People mostly just do grid search and random search with a configuration that you set up by intuition and experience. Meanwhile LLMs can see what each hyperparameter does, it can see what techniques and settings have worked in the literature, it can do something approximating common sense regarding what has a big enough effect. It's surprisingly difficult to precisely define when a training curve has really flattened for example.

So in theory there are many non-LLM approaches but they are not great. Maybe this is also not so great yet. But maybe it will be.

nextos 15 hours ago | parent | prev | next [-]

AFAIK, it's a bit more than hyper-parameter tuning as it can also make non-parametric (structural) changes.

Non-parametric optimization is not a new idea. I guess the hype is partly because people hope it will be less brute force now.

gwerbin 15 hours ago | parent | next [-]

It's an LLM-powered evolutionary algorithm.

ainch 15 hours ago | parent | next [-]

I'd like see a system like this take more inspiration from the ES literature, similar to AlphaEvolve. Let's see an archive of solutions, novelty scoring and some crossover rather than purely mutating the same file in a linear fashion.

nextos 14 hours ago | parent [-]

Exactly, that's the way forward.

There are lots of old ideas from evolutionary search worth revisiting given that LLMs can make smarter proposals.

UncleOxidant 14 hours ago | parent | prev [-]

That was my impression. Including evolutionary programming which normally would happen at the AST level, with the LLM it can happen at the source level.

coppsilgold 15 hours ago | parent | prev [-]

Perhaps LLM-guided Superoptimization: <https://en.wikipedia.org/wiki/Superoptimization>

I recall reading about a stochastic one years ago: <https://github.com/StanfordPL/stoke>

frumiousirc 14 hours ago | parent | prev | next [-]

> There are better techniques for hyper-parameter optimisation, right?

Yes, for example "swarm optimization".

The difference with "autoresearch" (restricting just to the HPO angle) is that the LLM may (at least we hope) beat conventional algorithmic optimization by making better guesses for each trial.

For example, perhaps the problem has an optimization manifold that has been studied in the past and the LLM either has that study in its training set or finds it from a search and learns the relative importance of all the HP axes. Given that, it "knows" not to vary the unimportant axes much and focus on varying the important ones. Someone else did the hard work to understand the problem in the past and the LLM exploits that (again, we may hope).

janalsncm 13 hours ago | parent | prev | next [-]

> The bottleneck in AI/ML/DL is always data (volume & quality) or compute.

Not true at all. The whole point of ML is to find better mappings from X to Y, even for the same X.

Many benchmarks can’t be solved by just throwing more compute at the problem. They need to learn better functions which traditionally requires humans.

And sometimes an algorithm lets you tap into more data. For example transformers had better parallelism than LSTMs -> better compute efficiency.

jpcompartir a few seconds ago | parent [-]

Fair push back, but I do think the LSTM vs Transformers point kinda supports my position in the limit, not refutes. Once the compute bottleneck is removed, LSTMs scale favourably. https://arxiv.org/pdf/2510.02228

So the bottleneck was compute. Which is compatible with 'data or compute'. But to accept your point, at the time the algorothmic advances were useful/did unlock/remove the bottleneck.

A wider point is that eventually (once compute and data are scaled enough) the algorithms are all learning the same representations: https://arxiv.org/pdf/2405.07987

And of course the canon: https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dat... http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Scaling compute & data > algorithmic cleverness

hun3 15 hours ago | parent | prev [-]

> There are better techniques for hyper-parameter optimisation, right?

There always are. You need to think about what those would be, though. Autoresearch outsources the thinking to LLMs.