I find myself wanting genetic algorithms to be applied to try to develop and improve these structures...

But I always want Genetic Algorithms to show up in any discussion about neural networks...

EvanAnderson 2 days ago | parent | next [-]

I have a real soft spot for the genetic algorithm as a result of reading Levy's "Artificial Life" when I was a kid. The analogy to biological life is more approachable to my poor math education than neural networks. I can grok crossover and mutation pretty easily. Backpropagation is too much for my little brain to handle.

	▲	VikingCoder a day ago \| parent \| next [-]
		In grad school, I wrote an ant simulator. There was a 2D grid of squares. I put ant food all over it, in hard-coded locations. Then I had a neural network for an ant. The inputs were "is there any food to the left? to the diagonal left? straight ahead? to the diagonal right? to the right?" The outputs were "turn left, move forward, turn right." Then I had a multi-layer network - I don't remember how many layers. Then I was using a simple Genetic Algorithm to try to set the weights. Essentially, it was like breeding up a winner for the snake game - but you always know where all of the food is, and the ant always started in the same square. I was trying to maximize the score for how many food items the ant would eventually find. In retrospect, it was pretty stupid. Too much of it was hard-coded, and I didn't have near enough middle layers to do anything really interesting. And I was essentially coming up with a way to not have to do back-propagation. At the time, I convinced myself I was selecting for instinctive knowledge... And I was very excited by research that said that, rather than having one pool of 10,000 ants... It was better to have 10 islands of 1,000 ants, and to occasionally let genetic information travel from one island to another island. The research claimed the overall system would converge faster. I thought that was super cool, and made me excited that easy parallelism would be rewarded. I daydream about all of that, still.
	▲	nrhrjrjrjtntbt 2 days ago \| parent \| prev \| next [-]
		Backprop is learnable through karpathy videos but it takes a lot of patience. The key thing is the chain rule. Get that and the rest is mostly understanding what the bulk operations on tensors are doing (they are usually doing something simple enough but so easy to make mistakes)
	▲	embedding-shape 2 days ago \| parent \| prev \| next [-]
		> Backpropagation is too much for my little brain to handle. I just stumbled upon a very nice description of the core of it, right here: https://www.youtube.com/watch?v=AyzOUbkUf3M&t=133s Almost all talks by Geoffrey Hinton (left side on https://www.cs.toronto.edu/~hinton/) are in very approachable if you're passingly familiar with some ML.
	▲	bob1029 2 days ago \| parent \| prev \| next [-]
		My entire motivation for using GAs is to get away from back propagation. When you aren't constrained by linearity and chain rule of calculus, you can approach problems very differently. For example, evolving program tapes is not something you can back propagate. Having a symbolic, procedural representation of something as effective as ChatGPT currently is would be a holy grail in many contexts.
	▲	DennisP 2 days ago \| parent \| prev \| next [-]
		I do too, and for the same reasons. Levy's book had a huge impact on me in general.
	▲	acjohnson55 2 days ago \| parent \| prev [-]
		You can definitely understand backpropagation, you just gotta find the right explainer. On a basic level, it's kind of like if you had a calculation for aiming a cannon, and someone was giving you targets to shoot at 1 by 1, and each time you miss the target, they tell you how much you missed by and what direction. You could tweak your calculation each time, and it should get more accurate if you do it right. Backpropagation is based on a mathematical solution for how exactly you make those tweaks, taking advantage of some calculus. If you're comfortable with calculus you can probs understand it. If not, you might have some background knowledge to pick up first.

▲

dcrimp a day ago | parent | prev | next [-]

I've been messing around with GA recently, esp indirect encoding methods. This paper seems in support of perspectives I've read while researching. In particular, that you can decompose weight matrices into spectral patterns - similar to JPEG compression and search in compressed space.

Something I've been interested in recently is - I wonder if it'd be possible to encode a known-good model - some massive pretrained thing - and use that as a starting point for further mutations.

Like some other comments in this thread have suggested, it would mean we can distill the weight patterns of things like attention, convolution, etc. and not have to discover them by mutation - so - making use of the many phd-hours it took to develop those patterns, and using them as a springboard. If papers like this are to be believed, more advanced mechanisms may be able to be discovered.

▲

altairprime 2 days ago | parent | prev | next [-]

That would be an excellent use of GA and all the other 'not based on training a network' methods, now that we have a target and can evaluate against it!

	▲	YouAreWRONGtoo 2 days ago \| parent [-]
		[dead]

▲

joquarky 2 days ago | parent | prev | next [-]

I got crazy obsessed with EvoLisa¹ back in the day and although there is nothing in common between that algorithm and those that make up training an LLM, I can't help but feel like they are similar.

¹ https://www.rogeralsing.com/2008/12/07/genetic-programming-e...

▲

CalChris 2 days ago | parent | prev [-]

I'm the same but with vector quantization.