Remix clone Hacker News

new | show | ask | jobs Github

	▲	cjparadise 15 hours ago
		Don't Quantize Use CONVERA Instead of focusing only on faster hardware or larger models, it focuses on: > Reusing work that has already been done. In its current public form, CONVERA: - runs LLMs locally (HuggingFace) - executes prompts through a controlled runtime - caches repeated prompt results - detects reuse opportunities - returns measurable latency improvements on repeat runs
	▲	cjparadise 13 hours ago \| parent \| next [-]
		[dead]
	▲	cjparadise 7 hours ago \| parent \| prev [-]
		[dead]