I feel like most of this recent Autoresearch trend boils down to reinventing hyper-parameter tuning. Is the SOTA still Bayesian optimization when given a small cluster? It was ~3 years ago when I was doing this kind of work, haven't kept up since then.

Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!

▲

karpathy 2 hours ago | parent | next [-]

Wrong and short-sighted take given that the LLM explores serially learning along the way, and can tool use and change code arbitrarily. It seems to currently default to something resembling hyperparameter tuning in absence of more specific instructions. I briefly considered calling the project “autotune” at first but I think “autoresearch” will prove to be the significantly more appropriate name.

▲

achierius 2 hours ago | parent | next [-]

Out of curiosity, what sort of things have you seen it do that better fit 'autoresearch' than 'autotune' thus far? Optimizations it made that wouldn't be been surfaced by an autotune system, I suppose.

	▲	karpathy 23 minutes ago \| parent [-]
		The most recent round of autoresearch (round 2) which decreased "time to GPT-2" from 1.8 hours to 1.65 hours had some examples. I adjusted the program.md to "look at modded nanogpt project and draw inspirations from there for things to try" and it came back with a bunch of tuning, but also tried and implemented new architecture changes, some of which actually helped including the smear gate and the backout skip connection. These are not just hyperparameters, they are new PyTorch code. I'm now working on a more general system that can have a queue of ideas that could be sourced from archive papers, github repos, etc.

▲

kraddypatties 2 hours ago | parent | prev | next [-]

I can believe that in the long run.

Does the agent have access to arxiv (a brief skim of the README didn't have an answer)? If not, it could be that the current approach of relying on the model's weights only is resulting in the perceived local optimum of hyperparameter tuning.

Anecdotally, we built a little MCP for arxiv to help with our internal research, noticed a significant boost in the diversity of methods (architecture or otherwise) Claude and friends were able to reference.

▲

corndoge 2 hours ago | parent | prev | next [-]

Would you say it's fair to describe autoresearch as a form of neural architecture search? I am curious what you think the core differences are between them.

▲

an hour ago | parent | prev | next [-]

[deleted]

▲

westurner 2 hours ago | parent | prev | next [-]

Is there a cost to converge? And how much does it vary with the random seed?

Re: OpenCogPrime:EconomicAttentionAllocation https://news.ycombinator.com/item?id=45518074 and something about eWASM (edit) https://news.ycombinator.com/item?id=47171887 .. from https://news.ycombinator.com/item?id=46825026 re: eWASM and costed opcodes for agent efficiency

▲

saberience an hour ago | parent | prev [-]

Have you actually used LLMs for non trivial tasks? They are still incredibly bad when it comes to actually hard engineering work and they still lie all the time, it's just gotten harder to notice, especially if you're just letting it run all night and generate reams of crap.

Most people are optimizing for terrible benchmarks and then don't really understand what the model did anyone and just assume it did something good. It's the blind leading the blind basically, and a lot of people with an AI-psychosis or delusion.

▲

nfg an hour ago | parent [-]

Do you realise who you’re replying to?

	▲	_menelaus 20 minutes ago \| parent [-]
		lolololol

▲

ipsum2 3 hours ago | parent | prev | next [-]

Hyperparam tuning that has better intuition and can incorporate architecture changes automatically. It won't invent something completely new though.

	▲	kraddypatties 2 hours ago \| parent [-]
		Hm, that's fair. It does feel like there's low hanging fruit in combining "old school" methods for conducting a hyperparameter sweep efficiently _with_ the higher level architecture edit ability of Autoresearch. Probably would cut the number of runs down by a significant number (as far as I can tell it's doing a grid search once it decides to mess with a knob or section of the architecture).

▲

3 hours ago | parent | prev [-]

[deleted]