Remix clone Hacker News

new | show | ask | jobs Github

	▲	seanhunter 4 hours ago
		I read that too, but I wondered whether elementwise error is the right metric. Surely the actual error metric should be to evaluate model performance for a conventional transformer model and then the same model with the attention mechanism replaced by this 4th order Taylor approximation?
	▲	vlovich123 3 hours ago \| parent [-]
		Bounded error weights by definition is a more strict evaluation criterion than “performance” metrics through running the model.