Remix clone Hacker News

new | show | ask | jobs Github

	▲	gormen 2 hours ago
		Most interpretability methods fail for LLMs because they try to explain outputs without modeling the intent, constraints, or internal structure that produced them. Token‑level attribution is useful, but without a framework for how the model reasons, you’re still explaining shadows on the wall.