Remix clone Hacker News

new | show | ask | jobs Github

	▲	NicoConstant 5 hours ago
		Kog (https://kog.ai) \| GPU Engineer \| Paris, France \| REMOTE within a Europe-compatible timezone, one week per month onsite in Paris We are hiring a GPU Engineer to work on the fastest LLM inference engine on standard datacenter GPUs. You would own low-level kernel work in CUDA/PTX or HIP/CDNA ISA, the monokernel pipeline, profiling infrastructure inside it, scaling to the frontier MoE models that run in production, and building our own agents that optimize kernels and inference autonomously. We generate 3,000 tokens/s per request on 8x AMD MI300X and 2,100 on 8x NVIDIA H200, at batch size 1, FP16, no speculative decoding. At batch size 1, the decode is GEMV, so it is memory bandwidth bound, and MBU is what counts. We rewrote the whole hot path ourselves, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel. Try it at https://playground.kog.ai Showing your code is part of the process. If you are outside a Europe-compatible timezone, relocation to one is required. Apply: https://jobs.ashbyhq.com/kog/e3950334-a2a6-43cc-a744-df6c386... Questions, email me at nicolas.constant@kog.ai