Ask HN: Why would we care about "extended time horizons" and LLMs?

	▲	Ask HN: Why would we care about "extended time horizons" and LLMs?
		2 points by ozozozd 8 hours ago \| 1 comments
		Is it more impressive to take longer to answer 2 + 2? It’s not. The longer one takes, the less intelligent we would rate that person. Somehow for AI agents taking longer is getting praise with the framing “maintaining attention for long-time horizons?” Have we collectively gone down to room temperature IQs with COVID? Why would the time dimension matter for a tool that is limited in context window? Doesn’t matter if you fill up the window in 1 second or 60 minutes. Also, it’s super easy to game. Insert random lags, reduce tokens/sec, there you have a model that maintains attention over “long-time horizons” Maybe more importantly how do people in this field buy into these easily game-able non-indicators so easily? How did they not develop the instinct to instantly call out metrics like lines of code, number of tokens burned or time taken to process a task as BS the instant they hear it? How do they benchmark their code? The longer running the better? Number of CPU cycles spent?
	▲	ben_w 8 hours ago \| parent [-]
		You have a common misunderstanding of what is meant by "time horizon". This is not "how long does AI take to do ${thing}", it is "how long does human take to do ${thing}, where ${thing} is from the set of things that AI has probability = n of getting right", where n happens to be 50% or 80% in the METR studies. At least, that's the short answer, here's a video with more depth: https://www.youtube.com/watch?v=evSFeqTZdqs My experience is the AI actually completes the task in a few minutes, when it was a 2-ish hour task and the AI has a time horizon of 2 hours at P(correct) = 0.8. It is I the human, not the AI used by me, that would have taken 2 hours.