| ▲ | alexgotoi 9 hours ago | |
The coolest thing here, technically, is that this is one of the first public projects treating time as a first‑class axis in training, not just a footnote in the dataset description. Instead of “an LLM with a 1913 vibe”, they’re effectively doing staged pretraining: big corpus up to 1900, then small incremental slices up to each cutoff year so you can literally diff how the weights – and therefore the model’s answers – drift as new decades of text get added. That makes it possible to ask very concrete questions like “what changes once you feed it 1900–1913 vs 1913–1929?” and see how specific ideas permeate the embedding space over time, instead of just hand‑waving about “training data bias”. | ||