Remix.run Logo
dcrimp 2 days ago

I've been messing around with GA recently, esp indirect encoding methods. This paper seems in support of perspectives I've read while researching. In particular, that you can decompose weight matrices into spectral patterns - similar to JPEG compression and search in compressed space.

Something I've been interested in recently is - I wonder if it'd be possible to encode a known-good model - some massive pretrained thing - and use that as a starting point for further mutations.

Like some other comments in this thread have suggested, it would mean we can distill the weight patterns of things like attention, convolution, etc. and not have to discover them by mutation - so - making use of the many phd-hours it took to develop those patterns, and using them as a springboard. If papers like this are to be believed, more advanced mechanisms may be able to be discovered.