Top
Best
New

Posted by jekude 20 hours ago

Talkie: a 13B vintage language model from 1930(talkie-lm.com)
554 points | 227 commentspage 5
ritonlajoie 8 hours ago|
User: you were in bed asleep and I woke you up it's the year 2026. I think you were asleep for many years. Any question for me ?

talkie-1930: Yes; please tell me the population of London.

User: 12

talkie-1930: The population of London in 1851 was 2,362,236, and in 1861 it was 2,803,034. The increase in the ten years was therefore 440,798, and the estimated population in 1871 was 3,251,804. In 1881 it was 3,814,571, and in 1891 it was 4,211,056. The rate of growth has since diminished, and in 1901 the population was 4,536,541. In 1911 it was 4,822,443, and in 1921 it was 5,000,000. The estimated population in the middle of 1926 was 5,330,000.

User: ok but there was a big disease and now it's 12. it doesn't surprise you ?

talkie-1930: No, it doesn't.

aftbit 18 hours ago||
Darn I've only got ~20 GB of VRAM. I really need to get a stronger machine for this sort of stuff.
MerrimanInd 18 hours ago||
20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?

I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.

zamadatix 17 hours ago||
> 20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?

Parameters are like Hertz - they don't really tell you much until you know the rest anyways. In this case, a parameter is a bfloat16 (2 bytes). I'm sure someone will bother to makes quants at some point.

> I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.

I grabbed a 395 laptop w/ 128 GB to be a personal travel workstation. Great for that purpose. Not exactly a speed demon with LLMs but it can load large ones (which run even slower as a result) and that wasn't really my intent. I've found GPUs make more usable local LLMs, particularly in the speed department, but I suppose that depends more on how you really use them and how much you're willing to pay to have enough total VRAM.

It's next to impossible to make your money back on local (regardless what you buy) so I'd just say "go for whatever amount of best you're willing to put money down for" and enjoy it.

mghackerlady 4 hours ago||
>2 bytes

So a wyde

Wowfunhappy 18 hours ago||
How much system memory do you have? Llama.cpp can split layers across cpu and gpu. Speeds will be slower of course but it's not unusable at all.
adt 17 hours ago||
We've got quite a list of history-only LLMs brewing on the Models Table.

https://lifearchitect.ai/models-table/

This one is easiest to talk to in a HF space:

https://huggingface.co/spaces/tventurella/mr_chatterbox

zozbot234 16 hours ago|
These are more like Small Language Models since the amount of textual data from the past is extremely limited, and most of what's out there hasn't even been digitized.
woolion 11 hours ago||
>talkie is a 13-billion-parameter language model trained on pre-1931 text >It can produce outputs that are inaccurate or offensive >but moderation is [only] applied

I don't think you can get even a moderate version of a person's opinion from the 30's. What even is the point of this? Open any book from the time and you will get far more "current day offensive" stuff. Given how hard it is to believe that there was no temporal leaking, and how inaccurate the results are, what use is there to it?

Moderation also seems to silently hang up the chat.

mghackerlady 5 hours ago||
See, things like this are what LLMs should be used for. They can be helpful but are best used for cool hacks like this (or, my first exposure to them, someone sticking one in a quagsire plush)
aesthesia 16 hours ago||
There's a similar but unreleased project here: https://github.com/DGoettlich/history-llms

I've been waiting for them to publish the 4B model for a while so I'm glad to have something similar to play with. I think I trust the Ranke-4B process a bit more, but that's partly because there aren't a lot of details in this report. And actually releasing a model counts for a whole lot.

One thing that I think will be a challenge for these models is achieving any sort of definite temporal setting. Unless the conversation establishes a clear timeframe, the model may end up picking a more or less arbitrary context, or worse, averaging over many different time periods. I think this problem is mostly handled by post-training in modern LLMs (plus the fact that most of their training data comes from a much narrower time range), but that is probably harder to accomplish while trying to avoid bias in the SFT and RL process.

3wolf 14 hours ago|
I wonder if it would be possible to do something simple like prepending sentinel tokens with the year. Or, since they're training a model from scratch anyways, tweak the architecture to condition on a temporal embedding. That opens the door to cool stuff like: Generate a response from 2050.
twoodfin 18 hours ago||
The Python example is fascinating, and a good rejoinder to anyone still dismissing LLM’s as stochastic parrots.
levocardia 16 hours ago||
Indeed, I found this part extremely interesting. The more general vision of "testing a vintage model on something invented after its training data ended" seems like quite a strong test of "true cognition" (or training data contamination, if you haven't stopped up all the leakage...)
brcmthrowaway 12 hours ago||
It's shocking!
kstenerud 14 hours ago||
Damn... I read "Talkie" and thought someone had brought back Talkie the Toaster.

https://www.youtube.com/watch?v=6HqGSioLCOQ

anthk 9 hours ago||
From 1930 like shows, there's the Red Panda podcast mimicking the era of the scifi radio serials:

https://archive.org/details/RedPandaAdventures

Yes, it's weird, cheeky and outdated, but it's really fun and they made a great job mimicking the old accent.

teleforce 18 hours ago|
>Have you ever daydreamed about talking to someone from the past?

Fun facts, LLM was once envisioned by Steve Jobs in one of his interviews [1].

Essentially one of his main wish in life is to meet and interract with Aristotle, in which according to him at the time, computer in the future can make it possible.

[1] In 1985 Steve Jobs described a machine that would help people get answers from Aristotle–modern LLM [video]:

https://youtu.be/yolkEfuUaGs

cedilla 18 hours ago||
The idea of talking to a machine that has all of humanities knowledge and gives answers is older than electronic computing. It certainly wasn't a novel idea when Jobs gave that speech. At that time, the field of artificial intelligence was old enough to become US president.
ok123456 15 hours ago||
Also, using natural language to interact with digital computers has been a research goal since the advent of interactive digital computers. AI in the 80s tried to do this with expert systems.

With the current crop of LLMs, you could argue it's now a solved problem, but the problem is nothing new.

fc417fc802 12 hours ago||
Solved in the sense that the core idea has been realized but unsolved in the sense that it isn't the sort of safe, reliable, deterministic interaction that was commonly envisioned.
anthk 9 hours ago|||
>Aristotle

As a snake oil seller, heh, I woudn't expect something better from Jobs. A competent and true programmer/hacker like Knuth and the like would just want to talk with Archimedes -he almost did a 0.9 version of Calculus- or Euclid, far more relevant to the faulty logic and the Elements' quackery from Aristotle.

jcgrillo 18 hours ago|||
Except... not at all? The vast majority of the training data required to create an artificial Aristotle has been lost forever. Smash your coffee cup on the ground. Now reassemble it and put the coffee back in. Once you can repeatably do that I'll begin to believe you can train an artificial Aristotle.
laichzeit0 14 hours ago|||
Also none of Aristotle’s exoteric works is extant. All we have are dry, boring lecture notes. Cicero said his public works were a “golden stream of speech” and its all lost. So I don’t see how you’d build an artificial Aristotle when we don’t have any of his polished works meant for the public surviving. Plato would be a better option, since his entire exoteric corpus is extant.
antonvs 16 hours ago|||
Your bar is too low. With the coffee cup, you at least have access to all the pieces - in theory, although not in engineering practice. With Aristotle, you don't have anything close to that.

Recreating Aristotle in any meaningful way, other than a model trained on his surviving writing of a million or so words, is simply not possible even in principle.

fragmede 15 hours ago|||
That's easy! All you have to do is simulate the whole universe on a computer, and then go the point when Aristotle is lecturing. Record all his works, then ctrl-c out of that and then feed those recordings into the LLM's training data. For the coffee, you just rewind the simulation and ctrl-c and ctrl-v it at the point you want.
jcgrillo 14 hours ago||
Fuck why didn't I think of that all those other times I fucked up in my life. Ctrl-z woulda done it every goddamn time.
jcgrillo 16 hours ago|||
OK I'll raise the bar--make sure when you reassemble the coffee cup and put the coffee back into it, the coffee is the exact same temperature as when you threw the whole shooting match onto the floor ;)

EDIT: and you don't get to re-heat it.

EDIT AGAIN: to be clear, in my post above (and this one) by "put the coffee back in" I meant more precisely "put every molecule of coffee that splashed/sloshed/flowed/whatever out when the cup smashed back into the re-assembled cup" i.e. "restore the system back to the initial state". Not "refill the glued-together pieces of your shattered coffee cup with new coffee".

freetanga 17 hours ago||
Imagine aiming for Aristotle and landing on Siri…
More comments...