Remix clone Hacker News

new | show | ask | jobs Github

	▲	MehdiBelkacem 6 hours ago
		Token anxiety is real. What worked for me: prompt caching on fixed system prompts cut my Anthropic bill by ~60% overnight. Most devs don't realize cache writes are 25x cheaper than input tokens on Claude. Local models for classification/routing + frontier only for generation is the other move — but the latency tradeoff is real if you're in a user-facing flow.
	▲	OccamsMirror 5 hours ago \| parent [-]
		How did you do that?