Remix.run Logo
momojo 18 hours ago

I'm surprised the point/comment ratio is this skewed. There's so much meat in the post to chew on. I like your writing. This was one of those blogs where I can tell you spent a massive amount of time on the technical, but simplified it to layman's terms. I hope you keep putting out stuff :).

I have a couple questions:

1. I think this quote should be raising *many more* eyebrows.

> The astounding thing about Goliath wasn’t that is was a huge leap in performance, it was that the damn thing functioned at all. To this day, I still don’t understand why this didn’t raise more eyebrows.

You put a cat's brain into a dog's head and its still breathing! It didn't flatline immediately! Is yesterday's news? This seems like the biggest take away. Why isn't every <MODEL_PROVIDER> attempting LLM-surgery at this moment? Have you noticed any increasede discourse in this area?

2. You mentioned you spent the beginning of your career looking at brains in biotech. How did you end up in a basement of GPU's, working not in biotech, but still kind of looking at brains?

Again, great post!

dnhkng 11 hours ago | parent | next [-]

Cheers. I will go back though my other old projects (optogenetics, hacking Crispr/CAS9 etc), and put them on my blog.

On your questions: 1) A few other papers have been mentioned in the thread, like Solar10.7B. They duplicated the whole transformer stack, and it kinda helped. But as I found experimentally, that probably not a great idea. You are duplicating 'organs' (i.e. input processing stuff), that should only have one copy. Also, that paper didn't see immediate improvements; they had to do continued pre-training to see benefits. At that point, I'm guessing the big labs stopped bothering. Limited by hardware, I had to find unusual angles to approach this topic.

2) Nah, no more wetware for me. I did a half decade of research at a big neurobiology institute, and while it was very enjoyable, I can truly say that grant writing and paper review are 'not my thing'. This reason this info was delayed so long is that I wanted a paper in the AI field to go along with my papers in other fields. But as a Hobbyist with no official affiliation, and the attention span of a gnat, I gave up and started a blog instead. Maybe someone will cite it?

trhway 8 hours ago | parent | prev [-]

>You put a cat's brain into a dog's head and its still breathing! It didn't flatline immediately! Is yesterday's news?

i think it isn't surprising giving how for example kernels in the first layers in visual CNNs converge to Gabors which are also the neuron transfer functions in the first layers of cat, human, etc. visual cortexes, and that there is math proving that such kernels are optimal (at some reasonable conditions).

And so i'd expect that the layers inside LLM reach or come close to some optimality which is universal across brains and LLMs (main reasons for such optimality is energy (various L2 like metrics), information compression and entropy)