History LLMs: Models trained exclusively on pre-1913 texts

Posted by iamwil 12/18/2025

History LLMs: Models trained exclusively on pre-1913 texts(github.com)

897 points | 421 commentspage 7

dkalola 12/20/2025|

How can we interact with such models? Is there a web application interface?

Aeroi 12/20/2025||

i feel like this would be super useful for unique marketing copy and writing. The responses sound so sophisticated like I read it in my grandfather's tone and cadence.

joeycastillo 12/19/2025||

A question for those who think LLM’s are the path to artificial intelligence: if a large language model trained on pre-1913 data is a window into the past, how is a large language model trained on pre-2025 data not effectively the same thing?

_--__--__ 12/19/2025||

You're a human intelligence with knowledge of the past - assuming you were alive at the time, could you tell me (without consulting external resources) what exactly happened between arriving at an airport and boarding a plane in the year 2000? What about 2002?

Neither human memory nor LLM learning creates perfect snapshots of past information without the contamination of what came later.

block_dagger 12/19/2025|||

Counter question: how does a training set, representing a window into the past, differ from your own experience as an intelligent entity? Are you able to see into the future? How?

ex-aws-dude 12/19/2025||

A human brain is a window to the person's past?

superkuh 12/18/2025||

smbc did a comic about this: http://smbc-comics.com/comic/copyright The punchline is that the moral and ethical norms of pre-1913 texts are not exactly compatible with modern norms.

GaryBluto 12/18/2025|

That's the point of this project, to have an LLM that reflects the moral and ethical norms of pre-1913 texts.

kldg 12/19/2025||

Very neat! I've thought about this with frontier models because they're ignorant of recent events, though it's too bad old frontier models just kind of disappear into the aether when a company moves on to the next iteration. Every company's frontier model today is a time capsule for the future. There should probably be some kind of preservation attempts made early so they don't wind up simply deleted; once we're in Internet time, sifting through the data to ensure scrapes are accurately dated becomes a nightmare unless you're doing your own regular Internet scrapes over a long time.

It would be nice to go back substantially further, though it's not too far back that the commoner becomes voiceless in history and we just get a bunch of politics and academia. Great job; look forward to testing it out.

lifestyleguru 12/19/2025||

You think Albert is going to stay in Zurich or emigrate?

erichocean 12/19/2025||

I would love to see this done, by year.

"Give me an LLM from 1928."

etc.

mleroy 12/19/2025||

Ontologically, this historical model understands the categories of "Man" and "Woman" just as well as a modern model does. The difference lies entirely in the attributes attached to those categories. The sexism is a faithful map of that era's statistical distribution.

You could RAG-feed this model the facts of WWII, and it would technically "know" about Hitler. But it wouldn't share the modern sentiment or gravity. In its latent space, the vector for "Hitler" has no semantic proximity to "Evil".

arowthway 12/19/2025|

I think much of the semantic proximity to evil can be derived straight from the facts? Imagine telling pre-1913 person about the holocaust.

anovikov 12/19/2025||

That Adolf Hitler seems to be a hallucination. There's totally nothing googlable about him. Also what could be the language his works were translated from, into German?

sodafountan 12/19/2025|

I believe that's one of the primary issues LLMs aim to address. Many historical texts aren't directly Googleable because they haven't been converted to HTML, a format that Google can parse.

ianbicking 12/19/2025|

The knowledge machine question is fascinating ("Imagine you had access to a machine embodying all the collective knowledge of your ancestors. What would you ask it?") – it truly does not know about computers, has no concept of its own substrate. But a knowledge machine is still comprehensible to it.

It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.

jaggederest 12/19/2025|

Jonathan Swift wrote about something we might consider a computer in the early 18th century, in Gulliver's Travels - https://en.wikipedia.org/wiki/The_Engine

The idea of knowledge machines was not necessarily common, but it was by no means unheard of by the mid 18th century, there were adding machines and other mechanical computation, even leaving aside our field's direct antecedents in Babbage and Lovelace.

More comments...