Remix.run Logo
jeffjeffbear 16 hours ago

They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Basically using GPT-5 and being careful

andy99 15 hours ago | parent | next [-]

I wonder if they know about this, basically training on LLM output can transmit information or characteristics not explicitly included https://alignment.anthropic.com/2025/subliminal-learning/

I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.

Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?

I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.

DGoettlich 6 hours ago | parent [-]

we were considering doing that but ultimately it struck us as too sensitive wrt the exact in context examples, their ordering etc.

QuadmasterXLII 15 hours ago | parent | prev | next [-]

Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s

DGoettlich 6 hours ago | parent [-]

that is simply how we display the questions, its not what the model sees - we show the chat-template in the SFT section of the prerelease notes https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Aerolfos 4 hours ago | parent | prev | next [-]

Ok so it was that. The responses given did sound off, while it has some period-appropriate mannerisms, and has entire sections basically rephrased from some popular historical texts, it seems off compared to reading an actual 1900s text. The overall vibe just isn't right, it seems too modern, somehow.

I also wonder that you'd get this kind of performance with actual, just pre-1900s text. LLMs work because they're fed terabytes of text, if you just give it gigabytes you get a 2019 word model. The fundamental technology is mostly the same, after all.

DGoettlich 44 minutes ago | parent [-]

[dead]

tonymet 11 hours ago | parent | prev [-]

This explains why it uses modern prose and not something from the 19th century and earlier