Remix clone Hacker News

new | show | ask | jobs Github

	▲	FromTheFirstIn 2 hours ago
		My understanding is that the true entropy floor of a language is intractable- regardless of context length there will be “unpredictable” tokens where cross entropy loss is bound to happen. Even with infinite parameters and data you’ll still have a chance at failing to predict the next token correctly a decent chunk of the time. Also, linear gains in context length scale quadratically with compute because of attention, so depending on context growth means taking a bath on GPUs for as long as you can, right?