Remix.run Logo
newu1729 2 hours ago

Location: India Remote: Yes (Worldwide) Willing to relocate: Yes (Visa sponsorship / relocation welcome)

Technologies: C++, CUDA, Python, PyTorch, Linux, GPU Programming, Deep Learning Systems, AI Infrastructure, AI Inference, CUDA Kernels, Flash Attention, Autograd, SIMD, pybind11

GitHub: https://github.com/Plkmoi/NexusGrad Résumé/CV: https://drive.google.com/file/d/1yyu4JSCIXtHZ3H4WF-d2jN2z-pC... Email: nararaab01@gmail.com

I'm an engineer focused on deep learning systems and AI infrastructure.

Over the past few years I've been building NexusGrad, a deep learning framework written from scratch in modern C++ and CUDA to better understand how frameworks like PyTorch work internally.

The project includes: • Custom tensor library (CPU/CUDA) • Reverse-mode automatic differentiation • Explicit computational graph runtime • CUDA kernels for tensor operations and attention • Flash Attention • ALiBi attention • Transformer components • AVX2/OpenMP CPU kernels • pybind11 Python bindings • Numerical verification against PyTorch using finite-difference gradient checking

I'm interested in roles involving AI model optimization, LLM inference, deep learning runtimes, AI infrastructure, CUDA development, GPU software, distributed training/inference systems, and performance engineering.

Happy to work remotely worldwide or relocate with visa sponsorship.