Remix clone Hacker News

new | show | ask | jobs Github

	▲	fc417fc802 a day ago
		Personally I could accept "most" provided that the failures were near misses as opposed to total face plants. I also wouldn't include "incompatible" tasks in the metric at all (but using that to game the metric can't be permitted either). For example the typical human only has so much working memory, so tasks which overwhelm that aren't "failed" so much as "incompatible". I'm not sure exactly what that looks like for ML but I expect the category will exist. A task that utilizes adversarial inputs might be an example of such.