Remix.run Logo
fern_ 6 days ago

  Location: MD, USA
  Remote: Yes, open to hybrid/in-person for the right opportunity
  Willing to relocate: Yep
  Technologies: ML stack tech – PyTorch, Python, some Docker, Transformers, ConvNets, LLMs, etc
  Résumé/CV: https://gist.githubusercontent.com/tysam-code/a7c49dcb72416b9cf62c399f882ae7b5/raw/6931376658adaf545c20f928247e8fe6851f3f14/fern_resume.txt (curlable, originally made for 80 px terminal. basic tech filter)
  Email: hi [.dot.] re [.dot.] tysam [atsymbol] gmail [.dot.] com
Hi! I’m fern. I’m an experienced ML researcher and developer (almost 10 years total!) with a wide arena of experience including computer vision and language modeling. I have an excellent intuition for research directions and am good at building codebases that allow for rapid research iteration. In the open source world, I’ve set the world records for several benchmarks, nearly tripling the speed of a longstanding CIFAR10 world record (~17.1s -> 6.3s, https://github.com/tysam-code/hlb-CIFAR10, recognized by Karpathy: https://x.com/karpathy/status/1620103412686942208), setting a modded-nanogpt speed record and helping with several others (5.03m -> 4.66m, https://github.com/KellerJordan/modded-nanogpt/blob/master/r... ), improving the relative loss of DiLoCo speedrunning on modded-nanogpt by over 40% (https://x.com/hi_tysam/status/1928561266533990587), and miscellaneous other misadventures throughout the years.

I’m looking to be challenged in a strong technical environment with other people who I can learn from, and I’m open to work on most interesting problems that I think are neutral or bring something positive to the world. I’m very hard-problem oriented, and am quite skilled at distilling complex, slow, brittle training procedures down to simple, performant, and hackable ones. The kinds of problems that engage me the most are ones where I can always be growing and learning, and that’s the kind of environment that I’ll thrive in. Learning new subdomains (molecular, etc) is something that keeps me driven, I love the process of expanding out my toolkit. I also very much enjoy working on problems well within my domain of expertise.

My focus is on data/architecture/training dynamics and debugging for training efficiency more than raw low-level kernel creation, though the two go hand-in-hand. I enjoy mentoring people quite a bit! I am looking to learn working with larger-scale distributed runs, scale is something that I’d like to work with a bit more.

If you think this may align with what you’re doing – say hi! I’d love to chat.