Oh man, that's funny to see one of my grad school class projects in that list. Takes me back. :-)

From that experience: The LLM is likely to do drastically better. Most of the prior work, mine included, took a genetic algorithm approach, but an LLM is more likely to make coherent multi-instruction modifications.

It's a shame they didn't compare against some of the standard core wars benchmarks as a way to facilitate comparisons to prior work, though. Makes it hard to say that they're better for sure. https://corewar.co.uk/bench.htm

▲

jacquesm a day ago | parent [-]

I'm not sure if that will hold up. The LLM is not going to do anything random and that is actually a powerful component that makes original output possible.

▲

kyralis a day ago | parent [-]

I wonder if a combination would be useful. Use an actual GA to do the mutation, and then let an LLM "fix" each mutated child.

	▲	jacquesm 19 hours ago \| parent [-]
		Could be. But the interesting thing is that all you can do here is optimize. Random chance is - like attention ;) - all you need.