Remix.run Logo
ahmadyan 2 hours ago

As a hacker, I kinda like naom's code. I was had to implement a TC MoE kernel, and stumbled upon his code from [tensor2tensor](https://github.com/tensorflow/tensor2tensor/blob/master/tens...) and i think "alchemy" is justified. Dude writes some beautiful kernels.

He also saw LLM would replace search before anyone else, and that is something to look at the Lamda or GPT-1's output and think: yeah this will answer all of our questions one day.

jvican an hour ago | parent | next [-]

There's no doubt about Noam's abilities. But I read through that code, and struggle to see its 'magic' or 'alchemy'. Can you elaborate what you find especially good about that code? (You may assume GPU kernel programming knowledge on my end.)

dekhn an hour ago | parent | next [-]

To me the magic Noam moment was when he came to my team and said "that cluster has a bad node in it, but this other one doesn't" and we had to spend like a week tracking down a single bad processor out of thousands.

jeswin an hour ago | parent | prev [-]

Unrelated to the particular code above. There's a difference between writing code about or adjacent to a proven idea vs writing code in uncharted territory. I suspect that is what happened here. It's the same thing with say music and art. A lot of people today can play Chuck Berry.

jvican an hour ago | parent [-]

It's a good point. Though I do wonder if the magic he casted was more at the conceptual level (intense belief on a set of primitives that ought to work) more than the code itself. Even by 2018's standards, the Tensorflow code above doesn't really look that impressive. It's hard to judge based on those past standards, though. But, wonder if somebody who knows more than me can elaborate.

eli_gottlieb an hour ago | parent | prev [-]

Also, evaluating complicated functions with numerical stability and automatic differentiation is hard.