Remix clone Hacker News

new | show | ask | jobs Github

	▲	Juminuvi 9 hours ago
		I know you say you don't use the paid apis, but renting a gpu is something I've been thinking about and I'd be really interested in knowing how this compares with paying by the token. I think gpt-oss-120b is 0.10/input 0.60/output per million tokens in azure. In my head this could go a long way but I haven't used gpt oss agentically long enough to really understand usage. Just wondering if you know/be willing to share your typical usage/token spend on that dedicated hardware?
	▲	KronisLV 38 minutes ago \| parent [-]
		For comparison, here's my own usage with various cloud models for development: * Claude in December: 91 million tokens in, 750k out * Codex in December: 43 million tokens in, 351k out * Cerebras in December: 41 million tokens in, 301k out * (obviously those figures above are so far in the month only) * Claude in November: 196 million tokens in, 1.8 million out * Codex in November: 214 million tokens in, 4 million out * Cerebras in November: 131 million tokens in, 1.6 million out * Claude in October: 5 million tokens in, 79k out * Codex in October: 119 million tokens in, 3.1 million out As for Cerebras in October, I don't have the data because they don't show the Qwen3 Coder model that was deprecated, but it was way more: https://blog.kronis.dev/blog/i-blew-through-24-million-token... In general, I'd say that for the stuff I do my workloads are extremely read heavy (referencing existing code, patterns, tests, build and check script output, implementation plans, docs etc.), but it goes about like this: * most fixed cloud subscriptions will run out really quickly and will be insufficient (Cerebras being an exception) * if paying per token, you really want the provider to support proper caching, otherwise you'll go broke * if you have local hardware that is great, but it will never compete with the cloud models, so your best bet is to run something good enough, basically cover all of your autocomplete needs, and also with tools like KiloCode an advanced cloud model can do the planning and a simpler local model do the implementation, then the cloud model validate the output