| ▲ | mediaman 15 hours ago | |||||||
This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application? One answer is that the study of history helps us understand that what we believe as "obviously correct" views today are as contingent on our current social norms and power structures (and their history) as the "obviously correct" views and beliefs of some point in the past. It's hard for most people to view two different mutually exclusive moral views as both "obviously correct," because we are made of a milieu that only accepts one of them as correct. We look back at some point in history, and say, well, they believed these things because they were uninformed. They hadn't yet made certain discoveries, or had not yet evolved morally in some way; they had not yet witnessed the power of the atomic bomb, the horrors of chemical warfare, women's suffrage, organized labor, or widespread antibiotics and the fall of extreme infant mortality. An LLM trained on that history - without interference from the subsequent actual path of history - gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history. In that sense - if you believe there is any redeeming value to history at all; perhaps you do not - this is an excellent project! It's not perfect (it is only built from writings, not what people actually said) but we have no other available mass compression of the social norms of a specific time, untainted by the views of subsequent interpreters. | ||||||||
| ▲ | vintermann 8 hours ago | parent | next [-] | |||||||
One thing I haven't seen anyone bring up yet in this thread, is that there's a big risk of leakage. If even big image models had CSAM sneak into their training material, how can we trust data from our time hasn't snuck into these historical models? I've used Google books a lot in the past, and Google's time-filtering feature in searches too. Not to mention Spotify's search features targeting date of production. All had huge temporal mislabeling problems. | ||||||||
| ||||||||
| ▲ | mmooss 8 hours ago | parent | prev [-] | |||||||
> This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application? Feeling a bit defensive? That is not at all my point; I value history highly and read it regularly. I care about it, thus my questions: > gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history. What validity does this 'compression' have? What is the definition of a 'compression'? For example, I could create random statistics or verbiage from the data; why would that be any better or worse than this 'compression'? Interactivity seems to be a negative: It's fun, but it would seem to highly distort the information output from the data, and omits the most valuable parts (unless we luckily stumble across it). I'd much rather have a systematic presentation of the data. These critiques are not the end of the line; they are step in innovation, which of course raises challenging questions and, if successful, adapts to the problems. But we still need to grapple with them. | ||||||||