Whoa, Alec Radford is on the list of authors! He was instrumental in building the original GPT models at OpenAI.
Sol- 4 hours ago||
Isn't it surprising that there were enough pre-1930 tokens to train an intelligent model? I was always under the impression that many tokens are also necessary to force the model to grok things and compress its learning into a somewhat intelligent model of the world, so to say. But perhaps I'm underestimating how much digitized literature exists from then.
b65e8bee43c2ed0 1 hour ago|
one of my greatest hopes for the advancement of LLM technology is a great reduction for the amount of data to train on. imagine a SOTA model trained exclusively on good prose, ah.
theobreuerweil 5 hours ago||
It would be really interesting to take a model like this and see if you can get it advance the frontiers of science, maths or whatever else in directions that we now understand but that it wouldn't based on the state of the art at the time.
imrozim 2 hours ago||
A model from 1930 that thinks computer is a human job is wild we come so far 100 years
postalcoder 4 hours ago||
The writing style is so refreshing. I am so tired of typical llm prose. Despite people's recent attempts to hide it, it's all so obvious. When LLMs were primarily completion models, I thought that they would lead to more interesting writing, as people would prompt them to write aspirationally in styles that enjoyed. I couldn't have been more wrong.
maxglute 6 hours ago||
Something very comfy about vintage rhetoric. I wish to see a 1930s vocab and 2020 shittalk combo though.
jonplackett 6 hours ago||
Question: could you train a model like this from before a known but important scientific breakthrough happened and see if it is able to work it out.
At least then you know the answer yourself and know it’s something that can be reasonably worked out.
0x3f 2 hours ago||
I think this is a good way to test a certain kind of capability, but as to whether LLMs would pass such a test, I'm guessing almost certainly not. If you've ever used one for research, it's very 'in' the current literature, whatever that may be. It's an incredible retrieval tool, and it will glibly evaluate any novel ideas that you feed in, but analyses are often incorrect when there's a paucity of directly relevant training data.
olmo23 6 hours ago|||
This is an active area of research. Demis Hassabis proposed training a model with a strict knowledge cutoff before 1915, and seeing whether it can independently arrive at general relativity.
connorgurney 6 hours ago||
This is a really fascinating idea… Just another one for the list of side-projects I’d like to get around to but never will!
light_hue_1 7 hours ago||
They did so much to keep this model from having data contamination and then in the post-training phase they basically gave up and undid all of their hard work.
This model is contaminated in subtle ways that make me skeptical of the results.
pizzalife 13 hours ago|
This is cool. Is it possible to easily install with ollama?
nateb2022 10 hours ago|
There's no GGUF available, but the process shouldn't be too hard from the provided .ckpt PyTorch checkpoint.