▲ | fern_ 18 hours ago | |
Hi! I’m fern. I have a fair bit of experience with neural networks (almost 10 years total!) between computer vision and language modeling. I’ve set the world records for a few benchmarks, nearly tripling the speed of a longstanding CIFAR10 world record (~17.1s -> 6.3s, https://github.com/tysam-code/hlb-CIFAR10, recognized by Karpathy: https://x.com/karpathy/status/1620103412686942208), setting a modded-nanogpt speed record (5.03m -> 4.66m, https://github.com/KellerJordan/modded-nanogpt/blob/master/r... ), improving the relative loss of DiLoCo speedrunning on modded-nanogpt by over 40% (https://x.com/hi_tysam/status/1928561266533990587), and miscellaneous other misadventures throughout the years.I’m looking to be challenged in a strong technical environment with other people who I can learn from, and I’m open to work on most interesting problems as long as they’re roughly net neutral or positive. I’m very hard-problem oriented, and am quite skilled at distilling complex, slow, brittle training procedures down to simple, performant, and hackable ones. The kinds of problems that engage me the most are ones where I can always be growing and learning, and that’s the kind of environment that I’ll thrive in. I’m certainly open to learning new subdomains (molecular, etc). I tend to focus more on data/architecture/training dynamics than raw low-level kernel stuff, though the two tend to go hand-in-hand. I have a tendency to enjoy mentoring people, casual mentorship is something that I greatly enjoy. I am looking to learn working with larger-scale distributed runs, scale is something that I’d like to work with a bit more. I’m keeping an eye open for a good opportunity that’s a good mutual fit for both sides, so I will approach most interview processes thoughtfully. If you think you have something that would be a good fit for us both – say hi! |