Remix clone Hacker News

new | show | ask | jobs Github

	▲	ben_w 3 days ago
		In one sense, all intelligence is a search in a gigantic solution space. But the difference is: What Deep Blue did was (if the Wikipedia page is correct) Alpha-beta pruning[0], where some humans came up with the function for what "better" and "worse" board states look like. And what LLMs do (at least the end models) includes at least some steps where there's an AI trying to learn what human preferences are in the first place, in order to maximise the human evaluation scores. Some of those things are good, like "what's the right answer to the trolley problem?" and "which is the better poem?", but some are bad such as "what answer best flatters the ego of the user without any regard for truth?" The former is exactly like route-finding, in that you could treat travel time as your score of better-worse and the moves as if they're on a map rather than a chess board. The latter is like being dumped into a new video game with no UI and all NPCs interact with you only in a language you don't know such as North Sentinelese. [0] https://en.wikipedia.org/wiki/Alpha–beta_pruning