Remix clone Hacker News

new | show | ask | jobs Github

	▲	jodoherty 4 hours ago
		I use pi with an RTX Pro 6000 Blackwell to run Gemma 4 31b to do all my agentic coding. I find it useful. This side project highlights a similar approach to how I scope and tackle projects at work now: https://git.theodohertyfamily.com/wg-wrap.git/tree/README.md https://git.theodohertyfamily.com/wg-wrap.git/tree/CASE_STUD... You have to apply a lot of careful architecture and TDD to your approach. Eliminate technical risk by tackling hard things early and wrapping them up in a simple, easy to use interface. I find I can get some projects done 2-3 times faster than if I wrote them by hand. It can also save about 5-10x time on mundane or broadly scoped projects by helping me consolidate and try out ideas very quickly. Setup-wise, I switch between vLLM using nvidia/Gemma-4-31B-IT-NVFP4 and llama.cpp using unsloth/gemma-4-31B-it-qat-GGUF with MTP. I throttle the GPU power usage to 400W. My current llama.cpp setup gets token generation rates between 60-150 t/s depending on MTP draft acceptance rates. Prefill is between 1500-4000 t/s depending on context length/depth.