| ▲ | newu1729 2 hours ago | |
Location: India Remote: Yes (Worldwide) Willing to relocate: Yes (Visa sponsorship / relocation welcome) Technologies: C++, CUDA, Python, PyTorch, Linux, GPU Programming, Deep Learning Systems, AI Infrastructure, AI Inference, CUDA Kernels, Flash Attention, Autograd, SIMD, pybind11 GitHub: https://github.com/Plkmoi/NexusGrad Résumé/CV: https://drive.google.com/file/d/1yyu4JSCIXtHZ3H4WF-d2jN2z-pC... Email: nararaab01@gmail.com I'm an engineer focused on deep learning systems and AI infrastructure. Over the past few years I've been building NexusGrad, a deep learning framework written from scratch in modern C++ and CUDA to better understand how frameworks like PyTorch work internally. The project includes: • Custom tensor library (CPU/CUDA) • Reverse-mode automatic differentiation • Explicit computational graph runtime • CUDA kernels for tensor operations and attention • Flash Attention • ALiBi attention • Transformer components • AVX2/OpenMP CPU kernels • pybind11 Python bindings • Numerical verification against PyTorch using finite-difference gradient checking I'm interested in roles involving AI model optimization, LLM inference, deep learning runtimes, AI infrastructure, CUDA development, GPU software, distributed training/inference systems, and performance engineering. Happy to work remotely worldwide or relocate with visa sponsorship. | ||