Wrong and short-sighted take given that the LLM explores serially learning along the way, and can tool use and change code arbitrarily. It seems to currently default to something resembling hyperparameter tuning in absence of more specific instructions. I briefly considered calling the project “autotune” at first but I think “autoresearch” will prove to be the significantly more appropriate name.

▲

achierius 2 hours ago | parent | next [-]

Out of curiosity, what sort of things have you seen it do that better fit 'autoresearch' than 'autotune' thus far? Optimizations it made that wouldn't be been surfaced by an autotune system, I suppose.

	▲	karpathy 27 minutes ago \| parent [-]
		The most recent round of autoresearch (round 2) which decreased "time to GPT-2" from 1.8 hours to 1.65 hours had some examples. I adjusted the program.md to "look at modded nanogpt project and draw inspirations from there for things to try" and it came back with a bunch of tuning, but also tried and implemented new architecture changes, some of which actually helped including the smear gate and the backout skip connection. These are not just hyperparameters, they are new PyTorch code. I'm now working on a more general system that can have a queue of ideas that could be sourced from archive papers, github repos, etc.

▲

kraddypatties 2 hours ago | parent | prev | next [-]

I can believe that in the long run.

Does the agent have access to arxiv (a brief skim of the README didn't have an answer)? If not, it could be that the current approach of relying on the model's weights only is resulting in the perceived local optimum of hyperparameter tuning.

Anecdotally, we built a little MCP for arxiv to help with our internal research, noticed a significant boost in the diversity of methods (architecture or otherwise) Claude and friends were able to reference.

▲

corndoge 2 hours ago | parent | prev | next [-]

Would you say it's fair to describe autoresearch as a form of neural architecture search? I am curious what you think the core differences are between them.

▲

an hour ago | parent | prev | next [-]

[deleted]

▲

westurner 2 hours ago | parent | prev | next [-]

Is there a cost to converge? And how much does it vary with the random seed?

Re: OpenCogPrime:EconomicAttentionAllocation https://news.ycombinator.com/item?id=45518074 and something about eWASM (edit) https://news.ycombinator.com/item?id=47171887 .. from https://news.ycombinator.com/item?id=46825026 re: eWASM and costed opcodes for agent efficiency

▲

saberience an hour ago | parent | prev [-]

Have you actually used LLMs for non trivial tasks? They are still incredibly bad when it comes to actually hard engineering work and they still lie all the time, it's just gotten harder to notice, especially if you're just letting it run all night and generate reams of crap.

Most people are optimizing for terrible benchmarks and then don't really understand what the model did anyone and just assume it did something good. It's the blind leading the blind basically, and a lot of people with an AI-psychosis or delusion.

▲

nfg an hour ago | parent [-]

Do you realise who you’re replying to?

	▲	emp17344 a few seconds ago \| parent \| next [-]
		Why should we care that he’s famous?
	▲	_menelaus 23 minutes ago \| parent \| prev [-]
		lolololol