Top
Best
New

Posted by NotInOurNames 10/28/2025

EuroLLM: LLM made in Europe built to support all 24 official EU languages(eurollm.io)
773 points | 606 comments
adzm 10/28/2025|
For those curious, the 24 official languages are Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish.

Maltese, interestingly, is the only Afro-Asiatic derived language.

Hungarian, Finnish, and Estonian are the three Uralic languages.

All the others are Indo-European, Greek being the only Hellenic one, Irish the only Celtic, the rest are Baltic, Slavic, Italic, or Germanic.

(I originally used the term Balto-Slavic, though I was unaware of some of the connotations of that term until just now. Baltic and Slavic do share a common origin, but that was a very very long time ago)

arbuge 10/28/2025||
> Maltese, interestingly, is the only Afro-Asiatic derived language.

It's Semitic, to be precise.

https://en.wikipedia.org/wiki/Semitic_languages

UebVar 10/28/2025||
Arabic, even. An outlier, as it is AFAIK the only arabic dialect that is not written with the arabic alphabet. Also it's far removed from other arabic dialects.
skissane 10/29/2025|||
Maltese isn’t an Arabic dialect. Yes, the grammar and phonology and core function words derive from Arabic, but more than half of the vocabulary comes from Italian/Sicilian-North African Arabic may borrow a few words from Italian here and there (just like English does), but not >50% of their vocabulary.
englishrookie 10/29/2025||
By your reasoning, English isn't a Germanic language since over half of its vocabulary comes from Latin or French.
ogogmad 10/29/2025|||
I think the family tree model of linguistic history is not very useful for English. Saying English is Germanic to the exclusion of everything else is not very useful.

The family tree model seems to assume that every language has only 1 direct ancestor. It seems to have been inspired by phylogenetic trees in biology. In phylogenetics, single-parent trees work fine because distantly related species can't breed with one other. By contrast, different languages borrow features from one another all the time. It could perhaps be useful for some languages, but not for English. I reckon.

argsnd 10/29/2025|||
You certainly wouldn’t call English a “dialect”
Archelaos 10/29/2025|||
"A language is a dialect with an army and navy" -- Max Weinrich

In the Yiddish original: "אַ שפּראַך איז אַ דיאַלעקט מיט אַן אַרמיי און פֿלאָט", see: https://en.wikipedia.org/wiki/A_language_is_a_dialect_with_a...

exasperaited 10/29/2025|||
Best not to needle the Maltese about their army and navy. They are tiny but tough (and still significant).

So tough that "siege of Malta" needs a disambiguation page on Wikipedia.

skissane 10/30/2025|||
Can’t read the Hebrew alphabet, but transliterated to Latin: “a shprakh iz a dyalekt mit an armey un flot” - I find it fascinating that despite knowing close to zero Yiddish, it makes complete sense.. well, I know a handful of German words (which covers “mit”)… and “flot” contextually makes sense as “navy”, especially if one knows English “flotsam and jetsam” (not navy but at least nautical)
englishrookie 10/29/2025|||
I would certainly call it a Germanic dialect.
findyoucef 10/28/2025||||
It's not at all far removed from the North African dialects of arabic which is the dialect that it's derived from. Tunisians and Algerians can understand Maltese quite well.
arbuge 10/30/2025||
> Tunisians and Algerians can understand Maltese quite well.

Not in my experience. Not at all actually. My experience with Arabic speakers is that they think they're understanding when you speak Maltese, because it sounds kind of familiar, but in actual fact they're not understanding much at all.

Which is not surprising after a thousand years of divergence.

findyoucef 10/30/2025||
Well well these Arabic speakers Tunisians and Algerians?
beeforpork 10/29/2025||||
Oh, stop it! What are you really trying to say? 'The same language' is usually just a desguised nationalistic claim. Ask yourself: what is the advantage of a language over a dialect or vice versa? Why are you fighting for it (or against it)?

Linguistically, it does not matter -- there is no objective definition of the difference between a language, a dialect, or whatever -lect.

coldtea 10/29/2025||
>'The same language' is usually just a desguised nationalistic claim

It's the opposite: "it's a different language" is usually just a nationalistic desire for differentiation of what are essentially dialects/variants of a language.

>Linguistically, it does not matter -- there is no objective definition of the difference between a language, a dialect, or whatever -lect.

That's more because academic linguistics, as developed in the latter half of the 20th century, had to pay lip service into several ideologies, rather than there not actually being good practical ways to discern e.g. arabic as a single basic language with different variants.

kazga 10/29/2025|||
> > 'The same language' is usually just a desguised nationalistic claim

> It's the opposite: "it's a different language" is usually just a nationalistic desire for differentiation of what are essentially dialects/variants of a language.

It's both. The idea that Ukrainian is an uneducated farmer's dialect of Russian is a common talking point in the "Greater Russia / Russkiy Mir" narrative. Conversely, asserting the status of the Ukrainian language is a big part of Ukrainian identity in the face of an imperial invasion.

jonathanstrange 10/29/2025|||
> That's more because academic linguistics, as developed in the latter half of the 20th century, had to pay lip service into several ideologies, rather than there not actually being good practical ways to discern e.g. arabic as a single basic language with different variants.

As someone who once studied General Linguistics, I don't understand this remark. I've learned that calling something a language is a political act and often of great significance to the speakers, but is almost never well-defined from a purely linguistic perspective. That's a fact. Although you can sometimes find typological criteria to further argue that a variety is a language on its own, for example there are good grammatical reasons for not counting Swiss German as a variety of German, you will also find examples the other way around where two varieties have large lexical and grammatical differences and still count as the same language.

The strongest criteria for what counts as a language are based on language origins (as opposed to typology), and these do not generally suffice or make meaningful distinctions to varieties (~dialects). Mutual comprehensibility can be very low for speakers of the same language, which is why most research focuses on varieties or on speaker groups that are of particular sociolinguistic interest.

I don't get why you talk about "academic linguistics" as if there was a non-academic one and why you think linguistics "had to pay lip service into several ideologies." What are you talking about?

coldtea 10/29/2025||
It's simple: linguistics is a politicized discipline, and there's a prevailing ideogically motivated tendency to put every language and dialect on equal footing.

>As someone who once studied General Linguistics, I don't understand this remark. I've learned that calling something a language is a political act and often of great significance to the speakers, but is almost never well-defined from a purely linguistic perspective. That's a fact.

Yes, this ideologically motivated idea after enough repetitionbecame "a fact" of the field, as if describing some objective physical law, and even non-political students will be taught and stick to the same (and anybody with a dissenting opinion will be getting an earful if not committing career suicide).

This wasn't always the case, it's more so with liberalism prevailing, especially in the latter half of the 20th century.

saturnite 10/30/2025|||
I looked it up. It was surprising to see that it's written left to right in basically the Latin alphabet with a few changes.
Vinnl 10/28/2025|||
Tomorrow there are elections in the Netherlands, and two parties are proposing adding Frysian to that list: https://neerlandistiek.nl/2025/10/kies-voor-taal/

Best get to retraining those models.

tecleandor 10/28/2025|||
AFAIK, they are trying to get Frisian added to the "European Charter for Regional or Minority Languages", not the official language list.

They get certain recognition, but they are not official in Europe. For example, just from Spain there are 13 languages on that list.

trollbridge 10/29/2025|||
To be fair to the Frisians, there are around 40,000-80,000 native Irish speakers and 500,000+ native Frisian speakers.
aprilthird2021 10/29/2025|||
Unfair comparison, imo. Irish (Gaelic) is a language which was intentionally suppressed for centuries.
dbspin 10/29/2025||
What relevance does that have? I'd say it's more important to acknowledge the fact that there are zero Irish speakers who don't also speak English. Including it as an official EU language is an ideological project rather than a pragmatic one.
aprilthird2021 10/31/2025||
Because there is a cause to revert the intentional damage done to Irish by the former rulers of the land. With Frisian there was no resistance to it. I think official language status helps provide resources to conservationists of various languages. And trying to conserve a language most of the speakers don't care about us a lot different than trying to conserve a language people do care about but we're forced to suppress for many years so have less ability to conserve it
igravious 10/29/2025|||
Irish is an official language of Ireland (there is signage and instructions in Irish up and down the Republic) , Frisian is not an official language of the Netherlands to the best of my knowledge

Irish is certainly not a robust vigorous language but your 40,000-80,000 numbers downplay it I'd suggest. Here are some statistics from Deepseek

   Category             Region                  Number of Speakers              Source & Year
   Some Ability         Republic of Ireland     1,873,997 (40% of population)   2022 Census
   Some Ability         Northern Ireland        228,600 (12.4% of population)   2021 Census
   Daily Speakers       Republic of Ireland     71,968                          2022 Census
   Daily Speakers       Northern Ireland        43,557                          2021 Census
   Native/Fluent        Global Estimate         ~80,000-170,000                 Various Sources
   Speakers U.S.        United States           ~20,000+                        Estimate
trollbridge 10/29/2025|||
Nothing against Irish as a language at all - I am just pointing out far more people learn Frisian as their mother tongue.

Whereas Irish seems to be heavily promoted but for whatever reason precious few people learn it as their mother tongue, and those who do so are primarily in an area where it’s always been that way. For better or worse, people are preferring to use English at home and Irish is treated like a luxury good.

tripzilch 10/29/2025|||
> Frisian is not an official language of the Netherlands to the best of my knowledge

Sorry, but it is.

Freak_NL 10/29/2025|||
Correct. Frisia (or Fryslân) is a bilingual province. Frisian is an official language of the Netherlands. Someone called in front of a judge in the north of the Netherlands has the right to be heard in Frisian, for example.

Fun fact: villages, towns, and cities in Frisia often have names which differ in Frisian and Dutch. In those cases the signs at the place limits will have both names listed; the official one on top (which in some cases is the Dutch name (e.g., Leeuwarden/Ljouwert) and in some cases the Frisian (e.g., Gytsjerk/Giekerk)).

tripzilch 10/29/2025|||
I really like that the intercom announcement voice in our trains (and also buses?) is bilingual.

And huh interesting, I didn't know that for some places with bilingual names, the Dutch name is official and for others the Frysian is? Who gets to decide that, the municipality?

Freak_NL 10/29/2025||
Yep, the municipality decides on such matters. Places do still occasionally have their names changed (rarely of course, because it involves a lot of work including updating addresses), usually aligning with local use. In the case of De Westereen a name from the local dialect replaced both the Dutch and Frisian names (Zwaagwesteinde and Westerein, respectively).

In a number of cases originally Frisian names actually supplanted older Dutch names (e.g., Burgum, Grou, Eastermar, etc.), so those places have just one name in both languages (except on the Dutch language Wikipedia because of weird reasoning about allowable sources and apparently a hatred of Frisianised Dutch names).

igravious 10/31/2025|||
Incorrect.
lou1306 10/29/2025||||
This is like saying German is an official language of Italy. It is officially recognized in specific bilingual provinces, not nation-wide.
tripzilch 10/29/2025|||
I dunno about the situation with the languages of Italy, from a cursory glance at Wikipedia it seems a _lot_ more complicated than Frysian/Dutch in NL, so I really don't think it's anything "like saying" that.

But "official" means exactly what it means, and when I'm saying "Frysian is an official language of the Netherlands", it means that it's recognized as an official language of Netherlands, by the Dutch government. And if it was up to the provinces I dunno, but it's not. Frysian is the one that's considered one of the official languages of the Netherlands.

I also don't think comparing to Italy makes sense at all because countries are different and decide what are their official languages for very different historical reasons. For instance you can look up what Dutch government body is responsible for deciding the Frysian language is an official one in the Netherlands and why, and you will very likely find no Italian equivalent of that.

lou1306 10/29/2025|||
It's not really that difficult, an official language OF a country is recognized at a national level. Thus all official government communication must be issued in that language. In the Netherlands, only Dutch has that level of recognition. Same in Italy for Italian

Then there are other, regionally-rocognized language that local governments use alongside the national one (West Frysian in Friesland, German in South Tyrol, etc.), and may even enjoy a majority of speakers within those regions, but they are not "an official language OF" the wider country.

OJFord 10/29/2025|||
Wikipedia says it's an official language in the region, as English also is regionally, but only Dutch is an official language nationally.

Which is exactly what it says for German in Italy, mutatis mutandis.

igravious 10/29/2025||||
in response to: “what language is the Constitution of the Netherlands written in?”

Deepseek answers with, “The Constitution of the Netherlands is written in Dutch.

    Dutch is the official language of the Netherlands and the language used for all primary government and legal documents, including the Constitution (Grondwet).
Key Context:

    Official Language: Dutch is the sole official language for national governance.

    The Kingdom of the Netherlands: It's worth noting that the Kingdom of the Netherlands also includes the Caribbean countries of Aruba, Curaçao, and Sint Maarten. While they have their own official languages (Papiamento and English), the Charter for the Kingdom of the Netherlands, which governs the relationship between these countries, is also originally written in Dutch.

    No Multilingual Version: Unlike some countries (e.g., Canada, Belgium, or Switzerland), the Netherlands does not have an official, legally equivalent version of its Constitution in any other language.

    Therefore, the authoritative and legally binding text of the Constitution exists only in Dutch.”
Frisian may be an official regional language but you're not going to convince me that it's an official language of the Netherlands. Love that I'm getting downvotes about this.

The Constitution of Ireland is written in Irish and English and to the best of my knowledge where differences arise the Irish one takes precedence.

igravious 10/31/2025|||
Sorry, but it isn't.
Zenst 10/29/2025|||
So same as Esperanto then.
mikrl 10/28/2025||||
As a Brit I feel very at home when hearing/reading Dutch and Frisian. It’s a reminder that England and the Low Countries share a lot of close history all the way back to Anglo-Saxon times; of being fishers, traders, burghers and mercenaries moving around the North Sea chasing opportunities, spreading and augmenting languages.

“Brea, bûter en griene tsiis is goed Ingelsk en goed Frysk”

hopelite 10/29/2025|||
That’s because all those languages are all essentially rooted in the languages/dialects of the Germanic tribes. It is why the Dutch get their English name from the German for German, Deutsch; and Nederland (Neder = Low) is German/Dutch for the Lowland Deutsch.

I’m sure everyone is aware that English comes from Anglish, i.e., the Angles as in the Germanic tribe.

Deutsch is derived from proto-germanic (as best we can tell) þiudiskaz, meaning “the people” i.e., the group of the different self associating tribes. It gets far more interesting in that it seems many of the strong dialects of especially southern Germany, Austria, and England have in fact retained some very old words and pronunciations that were lost in more standardized, conformed, and perverted dialects.

tirant 10/28/2025||||
Not only on the language but also in gastronomy and architecture. When I see old towns in UK I usually think about Dutch towns but just without any biking infrastructure.
arw0n 10/29/2025||
Dialect of Liverpool is called scouse, after a popular local dish -> lobscouse/Labskaus is very popular (love/hate really) in northern Germany as well.
tannhaeuser 10/28/2025||||
> However modern standard Dutch (Nederlands, Hollands) is based upon Franconian, rather than Saxon dialects.

> Some of these [Old Saxon] speakers took part in the Germanic conquest of England in the fifth century AD. While it is not true that English and Plattdeutsch derive completely from the same source, the Old Saxon input into Anglo-Saxon was of primary importance and this linguistic group contributed greatly to the Anglo-Saxon dialects which our English forefathers spoke.

[1]: http://www.plattmaster.de/plattoew.htm

RobotToaster 10/28/2025||||
If you've ever read anything written in old English, it's a even closer to Dutch.
kpil 10/29/2025|||
Old English looks more or less like old Norse to me. Or old Scandinavian as we say in Sweden...
veqq 10/29/2025||
Old English and Old Norse are mutually intelligible (especially after you realize the precise correspondences like un- = o-). Gunnlaugs Saga explicitly says the English and Norse are of one tongue and features a Norse poet singing to an English king. As another example, Ohthere of Hålogaland (Norway) visited King Alfred's 9th century English court and simply spoke to them in his own language:

https://web.archive.org/web/20170530232902/https://blogs.bl....

> Whoever preserved this story was also curious about Ohthere’s descriptions of where the Angles had lived ‘before they came into this land’ (England). Members of Alfred's court remembered that their ancestors came from mainland Europe, and they wanted to learn more about the lands which they identified as their own places of origin.

The scribe explicitly wrote things like "he said krán which we call crein" showing they were speaking in their own languages. It's even clearer if you consider our standard Old English is West Saxon from 850 and our standard Old Norse is from 1250 in Iceland (more different than the Danish variety of most Scandinavians in England). At the same time point,they would have more similarities (8th century Danish had wír before w turned to v).

https://en.wikipedia.org/wiki/Ohthere_of_H%C3%A5logaland

lawlessone 10/28/2025|||
Before the Dutch arrived would it have been something like Welsh that was spoken in England?
rgblambda 10/28/2025||
Anglo-Saxons not Dutch. But the short answer is yes. The word Welsh is derived from the Old English word for foreigner.

Latin would have been spoken in towns and cities but as Roman rule collapsed it was replaced by Brittonic (ancestor of Welsh), unlike in the continent where it developed into various Latin derived Romance languages.

Freak_NL 10/29/2025||||
Reading something like the Canterbury Tales is interesting as a Frisian, because old English really is close to Frisian — much closer than Dutch.
ifwinterco 10/29/2025|||
Dutch is funny - when I hear people speaking Dutch I almost feel like I should be able to understand it (but clearly as I've never learnt it, I can't).

The cadence and general way it sounds is much closer to English than any other language

przemub 10/28/2025||||
Each EU country nominates one official language for the EU, otherwise we'd have Catalan, Breton, Kashubian and many more.
Levitz 10/28/2025|||
Well, this was 4 days ago, Spain in talks with Germany regarding the addition of official languages:

https://www.politico.eu/article/catalan-basque-galician-boos...

foxglacier 10/29/2025||
If you can't find a common language within your own country, you shouldn't get to be one country.
darkwater 10/29/2025|||
I guess you are siding with the Catalans that want to be an independent country, then.
victorbjorklund 10/29/2025||||
Go back and tell the founding fathers of america that. Not everyone in america was english speakers.
coldtea 10/29/2025||
No, but everyone in the US was made, practically and in some ways officially, to become an english speaker.
Symmetry 10/29/2025|||
They're children were taught in schools that ensured that they learned English but many adult immigrants never learned to speak English. Carnegie Steel used to try to avoid having too many workers with a common language as part of a strategy to make unionization more difficult. And when Norman Borlaug was growing up in Saude, Iowa in the 1920s there were still a lot of older people around who only spoke Norweigen.
victorbjorklund 10/31/2025||||
yea, when did they make the law forbidding german speakers speaking German? Which year?
rvba 10/29/2025|||
I think English was not set as an official language since... Trump?

It was the de facto language, but not the official language. What was baffling.

wasmitnetzen 10/29/2025|||
The line between a language and a dialect is far too murky for this rule to be of any use.
rsynnott 10/28/2025||||
They could get Austria to do it, as it presumably has a spare slot.
outside1234 10/28/2025||
This raises an interesting question. Is there only one dialect of German in the LLM? My understanding is that the German German and Austrian German dialects are significantly different.
hebelehubele 10/28/2025|||
My German teacher always claimed that Swiss German and German German (Hochdeutsch) were so different that she needed subtitles to understand it, and she didn't understand why they weren't considered separate languages.
lhoff 10/28/2025|||
It depends. There is not one Swiss German but multiple subdialects. The language spoke around the Bern region very far away from German while the one from Zürich or Basel is much closer. Since there is no official written from they never really converged to a homogeneous language.
ch4s3 10/29/2025||
This sort of thing always makes me think of the English my grandmother from the foothills of the Appalachian mountains spoke. It vas very distinct from standard American English.
ipsi 10/28/2025||||
They really are very, very different. Knowledge of one helps with the other, but it's far more than just "a couple of weeks to adjust to the accent", for example.

EDIT: It's worth noting that this is mostly a spoken thing, AIUI - most formal/semi-formal writing would be in Hochdetusch rather than a local dialect.

biztos 10/29/2025||||
Even Swabian, a dialect spoken mostly in Germany, is almost unintelligible to non-native speakers when spoken by the natives of a certain age.
avadodin 10/29/2025||||
Unless you're thinking of one of the other Swiss languages, Swiss German is actually a variety of Hochdeutsch.

Historically, Germany used to be divided into countless small fiefdoms and each of them used to speak unique barely intelligible languages.

Hochdeutsch is in opposition to Niederdeutsch which Dutch and arguably English are a variety of.

umanwizard 10/28/2025||||
They are in fact considered separate languages.
tacker2000 10/28/2025|||
Yes but in practice pretty much the same except for some local changes in grammar and vocabulary, in written form.

The dialects are a whole other thing though.

umanwizard 10/29/2025|||
Sorry, maybe I wasn't clear enough, but I'm specifically talking about colloquial Swiss German -- which is, I assume, what you mean by "the dialects" -- and not about Swiss Standard German, which is indeed very similar to German Standard German and can't be considered a different language.

Any literate German can read the NZZ easily, but they cannot have a colloquial conversation with an average person from Zürich, unless the latter switches to standard German (which is a foreign language for them, though one they have to learn from age 6).

eru 10/29/2025||
> Any literate German can read the NZZ easily, but they cannot have a colloquial conversation with an average person from Zürich, unless the latter switches to standard German (which is a foreign language for them, though one they have to learn from age 6).

I presume they also pick up a lot of standard German in the media: there's lots of German movies, and Germany has the biggest movie dubbing industry in the world, too. There's some Swiss German media, but not nearly as much as there's on offer in standard German.

adastra22 10/28/2025|||
The same could be said of all Chinese dialects, which are also formally considered separate languages by all linguists.
adastra22 10/28/2025||||
They are considered separate languages in the same way that Chinese “dialects” are considered separate languages.
eru 10/29/2025||
Some Chinese dialects are a lot further apart than eg English and German. They are mostly called 'Chinese dialects' rather than languages for political reasons. Gotta project that unity.
adastra22 10/29/2025||
Yes. That was the point I was implying :)
geretnal 10/28/2025||||
Try dutch, it is combination of German and English!
thetoon 10/29/2025||
This, but with something oddly french about it, at least in the way it sounds.

As a native french speaker, no other language gives me that "why don't I understand what they say... oh, right, that's not my language!" feeling. Something with frequencies used, I suppose, but it always puzzles me.

layer8 10/28/2025|||
If Switzerland was in the EU, it would certainly be made a separate official language.
ipsi 10/28/2025||||
When spoken? Almost certainly. But I think they mostly write in Hochdeutsch, especially in formal contexts, at least that I've seen (private chats/etc are a totally different matter), so I don't foresee any major issues there.
lxgr 10/28/2025||
Austrian standard german is slightly different from the German variant, even when written. The differences are pretty minor, though, so it’s very possible to have a relatively long text without being able to tell which one it actually is (especially when potatoes are not referenced in it).
eru 10/29/2025||||
Well, even without any government mandates, ChatGPT is very happy to give you lots of dialects of English (and many other languages, too). Just ask for it.

Eg it does a passable impression of Singapore's Singlish.

thayne 10/29/2025|||
Not a native, but from what I understand, austrian german is pretty similar to what is spoken in southern Germany, but northern germany is significantly different.
runarberg 10/28/2025||||
Is English a legacy official language then from the time the UK was a member (I‘m guessing Ireland nominated Irish instead of English). Aside it feels very un-EU to push this limitation, as I was under the assumption that EU was all about celebrating (European) diversity.
handelaar 10/28/2025||
Still an official language, thankfully. Officially, because of Cyprus.
Muvasa 10/28/2025|||
Malta and ireland
adastra22 10/28/2025||
But if you’re only allowed one official language to add to the mix, they’d surely pick Maltese and Irish.
skissane 10/29/2025||
The history here is more complex than that… originally Irish was not an EU language because Ireland just used English… then as part of one the cycles of EU treaty renegotiation, Ireland successfully pushed for it to be made a secondary EU language… and then later successfully pushed for it to be upgraded to full status… so Ireland actually has two EU languages, their original one (English) and their newer one (Irish). Because the practical reality is everyone in Ireland is fluent in English-around 60% of Irish people can’t even speak basic Irish, and fluent Irish speakers is <10% of the population

Also, English remains one of the main working languages of the EU bureaucracy, because for many EU states (especially in Eastern Europe) it is a more popular foreign language than the other two (French and German)-when Czech diplomats need to talk to Spanish diplomats, English is the language they choose.

This idea people have here that “each country gets to nominate a language” isn’t how it actually works. The treaties just contain a list of languages, and which languages are in the list is down to diplomatic negotiations not any coherent principle.

globular-toast 10/29/2025|||
Why Cyprus? Their official languages are Greek and Turkish.
handelaar 10/31/2025||
Because they also recognise English and Greek's already covered by Greece.
pasc1878 10/29/2025||||
Which country nominates English? The obvious suspects are Ireland and Malta which have nominated non English languages so it is not them.
gambiting 10/29/2025||
Well I can only assume that when UK departed the EU, English wasn't removed automatically even though no country remaining in the EU nominates it as their official language of choice.
ranc1d 10/29/2025|||
Irish and English are both official languages in the Republic of Ireland. Irish is the first official language and English being the second.

https://www.irishstatutebook.ie/eli/2003/act/32/enacted/en/p...

pasc1878 10/29/2025||
Read the comment I replied to?

It says that each country can only request ONE language. And Ireland requested Irish.

OJFord 10/29/2025|||
The UK was not a founding member of the EEC which preceded it, I don't know and haven't easily been able to find out, but it wouldn't be that surprising if the European Parliament already used English as a common language for policy etc.

(In fact to strengthen that probability, if it had been say French, when and why would it have switched go English? Just because the UK joined?)

piltdownman 10/28/2025|||
Including the nasty political side-show that is Ulster Scots - literally only brought in as a chilling effect 'whataboutism' to diminish support when Irish speakers ask for language rights in Northern Ireland.

https://www.reddit.com/r/northernireland/comments/1fivtob/no...

pqtyw 10/28/2025|||
Well Scots is a real language. As much as English or any other. Whether enough people speak it especially in NI to justify it having an official status and such is another matter.
AlecSchueler 10/28/2025|||
This completely ignores the history of published writing in Ulster Scots going back centuries.
wizzwizz4 10/28/2025|||
This is one of those topics where the Hacker News take is unlikely to be correct. There's a lot of strong feeling here, and an outsider would need at least three books to understand the historical context (one of which, afaict, has not been written yet: it's oral tradition only).

People closer to the issue are better-placed to gather the necessary information, but again: strong feeling. Most people find it hard to get past that. The most informed person I know is so biased that I don't at all trust their conclusions.

AlecSchueler 10/30/2025||
What do you think is the Hacker News take?
rgblambda 10/28/2025|||
Part of the issue some people take with Ulster-Scots is that the current official 21st century literature doesn't read anything like the historic literature, which English speakers can easily read and understand. It's often made up of slang terms and archaic spelling, in an attempt to be as different as possible to English. Native speakers have complained that official documents and signage in Ulster-Scots are incomprehensible to them.
AlecSchueler 10/29/2025||
> the current official 21st century literature doesn't read anything like the historic literature

Does modern English read like historical English?

> Native speakers have complained that official documents and signage in Ulster-Scots are incomprehensible to them.

Sure, there are tonnes of issues with the "officialisation" of any language but the fact that there are "native speakers" involved in the debate strongly suggests it wasn't all just made up for political reasons, which was the point I was responding to.

rgblambda 10/29/2025|||
>Does modern English read like historical English?

If you can read and understand text from the 18th century, then yes. We're not talking about Middle English or Old English.

>but the fact that there are "native speakers" involved in the debate

I should have put native speakers in quotes as well. What counts as a native Ulster Scots speaker is someone who speaks English with an NI accent with some localisms thrown in.

Nobody speaks the official Ulster Scots that was invented because the Irish language was getting support and political leaders on the other side of the community felt they deserved something as well. The Protestant community in NI see it as a bit of an embarrassment.

AlecSchueler 10/30/2025||
> If you can read and understand text from the 18th century, then yes.

Yes, and I can read and understand historical Ulster Scots as well, but you were making a different point about codification/drift, no? The English I would find in those historical writings is quite different from what is being taught in schools today or recommended in style guides.

> What counts as a native Ulster Scots speaker is someone who speaks English with an NI accent with some localisms thrown in.

Then by your definition I am a native speaker. So how can we square it that you're telling me native speakers feel one way while I feel another way?

> Nobody speaks the official Ulster Scots

That's the nature of any newly codified minority language.

> The Protestant community in NI see it as a bit of an embarrassment.

There is no "protestant community" in Northern Ireland. A Dungannon farmer, an East Belfast loyalist and a BT9 lecturer will all give you very different views despite being of protestant background.

rgblambda 10/30/2025||
My point regarding the "official" language is that it bears little resemblance to the dialect that largely died out in the 20th century. i.e. it's a fabrication. Contrast that with the differing dialects of Irish where the grammar is identical with some variations in pronunciation.

I'm not entertaining the notion that I have to pretend you're a native speaker when you've made clear you're only identifying as such for the purpose of making an argument.

>There is no "protestant community" in Northern Ireland.

Anyone who applies for a job in NI fills out a form where they are asked if they are a member of "the Protestant community", "the Roman Catholic community" or neither. You're denying the factual existence of the different communities in NI for the purpose of winning an argument on the internet.

AlecSchueler 10/30/2025||
> My point regarding the "official" language is that it bears little resemblance to the dialect that largely died out in the 20th century i.e. it's a fabrication

Could you outline the key ways in which it differs? And say why that suggests the language was later "fabricated?"

> I'm not entertaining the notion that I have to pretend you're a native speaker when you've made clear you're only identifying as such for the purpose of making an argument.

If you won't entertain the notion that I'm a native speaker could you amend your definition of "native speaker" or explain what differentiates me from the native speakers whose complaints you referenced previously? And could you let us know where we can read about their complaints?

> Anyone who applies for a job in NI fills out a form where they are asked if they are a member of "the Protestant community", "the Roman Catholic community" or neither.

Of course you understand that the "protestant community" is not an homogenous group with shared views and opinions on these things. The reason that question is on the forms is because of historical discrimination against Catholics and the need to quantify heritage issues in order to avoid such discrimination forwards.

One protestant might feel embarrassment, another might feel pride, and another might not care at all. Suggesting there's a unified view from "the protestant community' is disingenuous.

rgblambda 10/30/2025||
https://www.bbc.co.uk/northernireland/learning/history/state...

This will answer all your queries.

>Suggesting there's a unified view from "the protestant community' is disingenuous.

I've yet to meet a member of that community in person (now you've decided they exist) who has any interest in Ulster Scots as a language, (even people who are quite opinionated and argumentative on other NI topics). This is evident in the lack of Ulster Scots language classes. There are more Irish classes running in East Belfast than for Ulster Scots.

Outside of the political class (who are only interested in it as a means to stifle support for the Irish language) Ulster Scots advocates are exclusively found online.

AlecSchueler 10/30/2025||
> This will answer all your queries.

It doesn't. It's just an opinion piece about the use of neologisms in certain publications. It makes the same claim about incomprehensibility for native speakers but also fails to reference the voices of any actual native speakers. Who are they? Do they really complain about this as you said?

> I've yet to meet a member of that community in person who has any interest in Ulster Scots as a language

Well? I have met them. I've met lecturers at Queens such as Ivan Herbison studying the thing, I've met artists like Willie Drennan touring the country sharing contemporary poetry and song in Ulster Scots. I've met people in the countryside of Antrim not only with an interest in it, but speaking it day to day. Just because you haven't personally encountered these people doesn't mean they don't exist.

> now you've decided they exist

This is quite unfriendly. I made a clear distinction between what you were claiming--a single protestant community who are collectively embarrassed by Ulster Scots--and the collection of people with a shared background who identify as protestants for the sake of anti-discrimination laws, but who are otherwise diverse in their beliefs and opinions. To say that in so doing I somehow conceded your original claim is again disingenuous. It also seems absurd in relation to your broader point to now insist that just because some politician decided a form should say "protestant community" that that is necessarily reflective of an on-the-ground reality.

> There are more Irish classes running in East Belfast than for Ulster Scots.

By your definition of native speakers everyone in East Belfast is already brought up speaking Ulster Scots at home, so of course there's more interest in other languages. There are more people from East Belfast attending Irish classes than English classes too, it doesn't mean no one is interested in English.

rgblambda 10/30/2025||
You asked for opinions and you got opinions. I can't disprove your claims about who you've met and what language they were speaking. I can only say it's at odds with my experience.

>By your definition of native speakers everyone in East Belfast is already brought up speaking Ulster Scots at home

But reading and writing in it? And would they agree they're speaking Ulster Scots or would they say it's English?

>There are more people from East Belfast attending Irish classes than English classes too

Did you not learn English in school? I find it hard to believe English isn't taught in East Belfast schools. And that's not counting English as a second language classes for immigrant communities. What language is the signage in in East Belfast?

piltdownman 10/29/2025|||
My comments are entirely aside from the dialect vs. language argument as a miniscule minority care about Ulster Scots in NI as a language in its own right - comparative even to say Cant or Shelta - versus the usual Stormont tomfoolery like 'cash for ash' scandals.

Simply put, Ulster Scots prominence in legislation is merely a reflection of bad-faith political negotiations by Unionists to degrade the status of the Irish Language Act by proxy. Anyone on the ground knows it for the dog-whistle that it is, used simply to curry favour with a particularly sectarian unionist base in as a counter to the Irish Language provisions outlined and agreed to in the Good Friday Agreement.

And that's 'curry favour' - not 'curry my yoghurt' by the way. https://www.bbc.com/news/uk-northern-ireland-29895593

This has more or less been the case ever since the forced Ulster plantations lead to the development of Ulster Scots as a defined community with resilient Protestant and unionist ties. It'd be far more credible if Fingal tried to secede from Dublin and the Republic tomorrow morning using Yola as a justification.

https://en.wikipedia.org/wiki/Yola_dialect

In short, the ILA and promotion of Gaeilge in the north is about trying to make some small reparation at a state level for a cultural genocide perpetrated by our Colonists, and to help re-establish the oldest written vernacular language in western Europe, dating back over 2,500 years.

The promotion of Ulster Scots however... well the Commissioner is literally called 'Commissioner for Ulster Scots and Ulster British Tradition'. This is after DUP members removed themselves from the equality and good relations group after basically fillibustering for 5 years of discussions on bi-lingual signs to force a stalemate.

https://www.belfastlive.co.uk/news/northern-ireland/dup-stor...

AlecSchueler 10/30/2025||
> My comments are entirely aside from the dialect vs. language argument...

Ah right, I get you now! The point you're making is fair enough, apologies for drawing the labour from your to explain it so fully.

sigmar 10/28/2025||||
Should be noted- the Netherlands can't unilaterally make changes. Spain has been trying to push for languages to be added and hasn't had luck.
Vinnl 10/28/2025||
Haha I just added it as a fun fact, I don't actually believe folks will need to start retraining things, or that this is likely to be at the top of the priorities list for anyone. Party programmes are aspirational anyway.
rzwitserloot 10/29/2025||||
Not sure what happened there but your link disproves your statement.

Specifically, the link says two things:

1. That 2 parties want to add *limburgish* to the list, not frisian. That's the bottom-right part of The Netherlands, about as far removed from Friesland as you can get (which is the top part of the Netherlands).

2. That one party wants to add Frisian, but, that is a one-day fly party that will cease to exist in a few hours as they will get 0 seats in this election and will presumably call it a day right after. It was a party founded to support one person and that person has quit due to workstress, and is highly unlikely to return as this _was_ his return. Their opinion used to be relevant as they had 13.3% of the seats this past session (and didn't exist before it). But, it isn't here.

Vinnl 10/29/2025||
Whoops yeah, misremembered - didn't reread as I posted the link. Was more of a fun off-the-cuff remark, so didn't spend too much energy on it. But yes, I meant Limburgish rather than Frisian.
ginko 10/28/2025|||
Just do a 50:50 mix of the German and Dutch model weights.
Vinnl 10/28/2025||
Oops, accidentally made the model speak Limburgish.
purrcat259 10/28/2025|||
I read, write and speak Maltese, AMA if you are curious about the language.
franklin_p_dyer 10/28/2025|||
Not a question, but - Tatoeba could use your help! It is an open source (both code and data) dataset of parallel sentences and their Maltese data is very lacking. Also it’s pretty fun to just translate a bunch of random sentences into a language you speak. :-)

https://tatoeba.org/

Raed667 10/28/2025||||
Tunisians claim they can understand Maltese with minimum effort, is it reciprocal? How close is Maltese to arabic / tunisian dialect ?
purrcat259 10/28/2025|||
I don't have much personal experience in attempting to communicate with arabic speakers. From others I have heard Lebanese arabic is the closest and you can have a passable conversation.
arbuge 10/28/2025|||
Not sure which Tunisians are claiming this but they'd definitely need a lot more than minimum effort. Maltese split off from Arabic around 1k years ago. The two languages sound pretty different, and are written with different alphabets.
findyoucef 10/28/2025|||
As an Algerian, I can confirm that Maltese is surprisingly easy to understand. I was genuinely shocked the first time I heard it because the similarities are so obvious. Many Arabic dialects are also written using the Latin alphabet, especially online and on social media, so the different writing systems aren’t really a barrier at all.
arbuge 10/30/2025|||
Calling BS on this one. I'll let ChatGPT handle it... it says it better than I could:

can arabic people understand maltese?

That’s a really interesting question — and the answer is: *partially, but not easily.*

Here’s why:

### Linguistic roots

Maltese is a *Semitic language*, and its *core grammar and basic vocabulary* come from *Arabic*, specifically from *Siculo-Arabic*, the dialect of Arabic spoken in Sicily and Malta about 1,000 years ago. Because of that, *many Maltese words sound familiar* to Arabic speakers — especially from the *Maghrebi (North African)* or *Levantine* dialects.

For example:

| Maltese | Meaning | Similar in Arabic | | ------- | ------- | ----------------- | | Dar | house | دار (dar) | | Kelb | dog | كلب (kalb) | | Seba | seven | سبعة (sabʿa) | | Xemx | sun | شمس (shams) |

### Influence from Italian and English

However, over the centuries, Maltese absorbed *a lot of Italian (especially Sicilian)* and *English* vocabulary — so modern Maltese is *a hybrid*. Roughly:

* 30–40% of its vocabulary is Semitic (Arabic origin), * 40–50% is Romance (mostly Italian/Sicilian), * and the rest is English and other sources.

That means Arabic speakers might *recognize some words and structures*, but they’ll *struggle to understand full sentences*, especially because:

* Pronunciation has changed, * Grammar evolved differently, * Many everyday words are not Arabic anymore.

### Summary

So:

* *Yes*, Maltese and Arabic share a deep connection — like cousins. * *No*, they’re *not mutually intelligible* today. An Arabic speaker might catch words here and there, but a real conversation would be hard without studying Maltese.

The above is exactly my experience with Arabic speakers by the way. Again, not surprising after 1k years of divergence.

findyoucef 10/30/2025||
I will let my own experience tell the story instead of chatgpt.
slim 10/29/2025||||
Tunisian dialect must have split of at the same time, because it's as far from arabic as maltese is. most arabs don't understand our dialect (fortunately we also speak standard arabic which we learn at school). I read some research saying maltese/tunisian is a separate language called lingua franca
Raed667 10/29/2025||
Nice seeing you around here =) been a while !
slim 11/8/2025||
call me when you are in Tunis :)
cenamus 10/28/2025|||
Also lots of influence from Italian and English.
barrell 10/28/2025||||
I recently discovered Maltese existed, and started learning it that day. I find it such an awesome language, and not just because of the letter Ħ

I do wonder what natives think and feel about the longevity of their language? What is taught in schools at what ages (assuming English is in the mix somewhere). Is there enough media in Maltese for Malti to go about the moderns at fully in Maltese? It’s shockingly hard to find any information on Maltese, and even harder to find content.

I’m not sure if’s dying out, or in danger thereof; if there are preservation efforts, or if there is no need.

lullu57 10/28/2025||
Native Maltese speaker here. It is thought in schools alongside English, with both being official national languages. Most people locally, that are not foreign born or immigrants speak the language, and it is used in most households as the main language. But everyone grows up bilingual, as English is essential for most everything else that we do as a nation.
nxor 10/28/2025||||
How are loan words viewed? Do businesses work in Maltese? Are monolingual speakers of the language regarded differently than those fluent in English? Do young people in Malta listen to Maltese music?
purrcat259 10/28/2025|||
Maltese has been loaded with loan words since forever. 5 points if you can guess where bonġu, bravu and mappa come from. At some point there was some literary council for the language that decided that any new loan words should just be spelled phonetically. Computer became kompjuter.

Businesses do work in Maltese and English. Both are official languages. Its quite rare to encounter a business that deals near exclusively in Maltese. Many prefer Maltese but will fall back to english where necessary.

Regarding monolignual speakers, I think theres a lot of stereotypes for maltese only, english only and code switchers. I think its all a bit silly... So as long as communication can happen I don't fuss.

On Maltese music... There's a lot of low ish quality music then there's a few absolute gems. Look up The Travellers, Lapes, Jon Mallia on YouTube/Spotify.

unscaled 10/29/2025|||
Not sure if I should be get bonus points for that, but if mappa means map, the ultimate origin is still Semitic. Latin seem to have took the word maappa from a Canaanite language. The word mappa (and it's older version "manpa") is attested in Minshnaic Hebrew (meaning a napkin or a tablecloth), although you could say Hebrew "re-loaned" the cartographic meaning - which is much newer.
lullu57 10/28/2025||||
I can concur. All older words (think any word that was needed since the older generations), are Arabic based. All the numbers, all older verbs etc. 'Newer' words are latin based.
nxor 10/28/2025|||
Interesting, but I get the impression that ubiquitous English loan words in seemingly every language is a lot different than loan word patterns of the past. Do you think? Maybe not?
purrcat259 10/28/2025||
I don't have much of an opinion I suppose english language cultural dominance has meant that newer words are just imported rather than adapted
JAlexoid 10/28/2025|||
Yes, there's plenty of Maltese spoken and listened to.

I was surprised to hear Maltese radio stations played in taxis, while visiting Malta just a few weeks back

nxor 10/28/2025||
The point of my question was to ask someone who lives there, not someone who visited
JAlexoid 11/4/2025||
Nowhere did you specify that. I suggest being specific about your request
adzm 10/28/2025||||
I'm actually really curious about everyday usage of the language; is code switching between English and Maltese more common than Maltese on its own? I've seen a few online communities where the vocabulary switches between Maltese and English very often which is interesting but I wonder how much of that is just online / written versus everyday speech.
purrcat259 10/28/2025||
Depends on where you live and how you were brought up, but for the most part code switching is default.

There was a point about 7 years ago when the overton window shifted to "speak english to strangers first" because of a large influx of foreigners who did not know the language. Since then I've met foreigners who have better Maltese than some natives.

Older folks & geriatrics will sometimes be surprised when they assume someone is foreign and they turn out to be Maltese. "int Malti??" is a statement I get often because I don't look Mediterranean despite being born here.

ebb_earl_co 10/28/2025||||
What is the name of Maltese in Maltese? Like “el español” in Spanish, it’s neat to know what languages call themselves
kwk1 10/28/2025|||
A term for that concept, by the way, is "endonym":

https://en.wikipedia.org/wiki/Endonym_and_exonym

ggsp 10/28/2025||||
Wikipedia says it's "Malti"
arbuge 10/28/2025||
Il-Malti to be precise. Il- means "the" and changes its meaning to that of the language. Malti alone would mean a Maltese person.

Source: I'm also Maltese.

jll29 10/28/2025||
The "Il" in Il-Malti is like "al" in Arabic, which Maltese is closely related to as was pointed out above.

Arabic (language): al-‘arabiyyah (الْعَرَبِيَّة).

kridsdale3 10/28/2025|||
'ish' is a pretty universal english suffix. So Spanish is just "españ-ish".
Tade0 10/28/2025||||
How is "Marsaxlokk" really pronounced? I've heard that word a few times, but never from a native. Google translate can't help me here, as it doesn't seem to have Maltese text-to-speech.
purrcat259 10/28/2025||
Read with English pronunciation, closest would be mar-sa-shlock.
cess11 10/28/2025||
From my experience it will be understood by locals when pronounced like that.
runarberg 10/28/2025||||
Is there any dialect of Arabic which you can understand without too much effort?

How much do you consider Maltese its own language (as opposed to a dialect of Arabic)?

purrcat259 10/28/2025|||
From what I have heard, Lebanese Arabic is the closest, and still pretty far. Passable conversation is possible.

Maltese is definitely its own language. Arabic roots are there (theres a Semitic joke in there ) but it isn't arabic anymore. Its written left to right with a variant of the english alphabet.

aprilthird2021 10/29/2025||
Writing RTL or LTR and alphabet alone don't make a language different.

Hindi and Urdu are 90% the exact same language, and are mutually inteligible (Urdu speaker and Hindi speaker can have complete full conversation with each other) but each is written differently (one LTR the other RTL) and with different alphabets

runarberg 10/29/2025||
See also Croatian, Serbian, and Bosnian. I also find Chinese to be interesting, e.g. Mandarin and (formal) Cantonese have a near identical written language, while the spoken language is completely different, views on whether or not those languages are different languages or dialects vary wildly.

In my books, the distinction between languages and dialects are so arbitrary that the best method is simply to ask the people that speak those languages/dialects. If they consider them to be different language (which Maltese speakers seemingly do) I call them different languages.

aprilthird2021 10/31/2025||
Mandarin-Cantonese is very interesting and a unique (to my knowledge) example where the same written language can be completely different to two different people.

I don't buy the argument of just asking the speakers. There are cultural, political, etc. reasons people may think things which don't conform with reality. Many Hindi-Urdu speakers get insulted by the reality that the languages are pretty much the same because they don't want to identify with people from another country their country is constantly at war with.

notahacker 10/28/2025|||
I know that the reverse understanding isn't too bad from chatting with a Saudi-born member of staff on holiday in Malta.

I don't think anyone would seriously consider it a dialect of Arabic though with its completely different alphabet and half the vocabulary and morphology coming from Italian languages/dialects, even if Malta hadn't spent the best part of a millennium trying very hard not to become part of the Arab world

cm2012 10/28/2025|||
Can you communicate with Maltese dogs more effectively?
purrcat259 10/28/2025||
Only if we have a few Maltesers first
jim180 10/28/2025|||
Lithuanian and Latvian are Baltic languages. Nothing to do with Slavic...
adzm 10/28/2025|||
I was thinking about separating the two groups when I was writing this but was afraid of getting too verbose, though in retrospect that probably would have made more sense regardless of the historical lineage. My apologies if this came off as inconsiderate.

I updated my original comment, and learned a good amount about that dispute as a result, so thanks for calling it out.

Telaneo 10/28/2025||||
https://en.wikipedia.org/wiki/Balto-Slavic_languages
asveikau 10/28/2025||
See the section "historical dispute".

I think some people get touchy about them being lumped together if their last period of commonality (per the article) was 1400 BCE. For comparison, I believe all the Slavic languages were mutually intelligible around 1200 AD. But much more recently than this, in the last few centuries, there have been notable attempts by east slavs to absorb the Baltic language cultures and deny them.

krzyk 10/28/2025||
I doubt that South Slavic and West/East Slavic were mutually intelligible at 1200 AD.

I doubt West and East Slavic were. But inside those geographic groups they probably were (Czech and Polish AFAIR were around that time).

asveikau 10/28/2025|||
I may be off by 100-200 years, but this is what I read. There were accents and regionalisms but they were all mutually intelligible.

It is an example I think of often, about how quickly languages can change. In the scale of 1000 years, a lot changes. Most of the diversity in Romance languages is from around that timescale too, it really started to diverge substantially around 900ad-1100ad.

actionfromafar 10/28/2025|||
Depends on your standards, too. Even today, any pair of slavic speakers should have a head start in understanding each other. Put them next to each other for a month and they should be talking, at least about basic everyday things.
krzyk 11/1/2025||
Not quite. My anecdotal examples. I'm Polish.

I was in Crimea for about 2 weeks (in 2012) they split me Russian there. I couldn't understand a word they said. And I didn't learn to understand than for 2 weeks of travel there.

I could understand some words from Ukrainian (I traveled by train from Lviv).

Another example is Croatian, I've been there on vacation and renting a room. I couldn't understand a word they said and didn't learn any.

TlI can understand some Czech (because this is the closest language together with Slovakian to Polish) but that's it.

I wouldn't mix Slavs from different groups together. They evolved separately and are as close as English and German.

kaato137 10/28/2025|||
Balto-Slavic branch divides into Baltic and Slavic language groups so nothing wrong here
sublimefire 10/28/2025|||
It is just one of the theories, there is no clear evidence to suggest that Baltic and Slavic were the same language thousands of years ago.
pqtyw 10/28/2025||
Well there is if you go far enough. It's just the question when did they split off from each other. However there is no question that Baltic and Slavic are more closely related to each other than any other non extinct Indo-European languages.

The fact they they are the closest surviving relatives on it own doesn't mean it makes sense to group them together (i.e. Italo-Celtic is also a theorized subgroup in a similar way but nobody is disputing that Celtic and Italic languages evolved into distinct groups).

Then there is a huge amount of missing links and unknown unknowns. e.g. Thracian and Dacian probably were also pretty close to Baltic or Slavic (maybe even closer to Baltic than Slavic is but we don't know enough about them to make any conclusive claims at all... but we at least know these languages existed)

Tade0 10/28/2025||||
Plenty of wrong here, considering Lithuanian and Latvian are utterly unintelligible to slavs, save for loanwords, but Slavic languages between themselves retain some level of intelligibility, which even spawned two competing constructed languages.
kreetx 10/28/2025|||
Yup, most of Eastern Europe are Balto-Slavic. While the division from the Eastern Slavic languages (Russian, Belarussian, Ukranian, etc) is distant, they are still Slavic. From Eastern Europe, only Estonian is not a Slavic language.
NicuCalcea 10/28/2025|||
> From Eastern Europe, only Estonian is not a Slavic language.

Well, that and Romanian. And Hungarian. And outside the EU, Albanian. And Georgian, Azeri and Armenian if you consider those Eastern Europe.

kreetx 10/28/2025|||
I regret being that loose with the designation :), Romanian and Hungarian are valid counter arguments.

In my mind, I was thinking of the belt of countries between Russia and Central Europe, starting from the Baltics down to the Balkan (excluding Greece).

NicuCalcea 10/28/2025||
Even by your definition, I can count at least seven countries where the official language is not Slavic. And that's not even including all the Altaic, Romance and other assortment of regional languages, many of which have some sort of official status.
ardit33 10/28/2025|||
Albania is not "East Europe", but South East. Same as Greece.
NicuCalcea 10/28/2025||
That's just your opinion, and the UN would disagree: https://www.un.org/dgacm/en/content/regional-groups#:~:text=...

Some of my fellow Romanians will also claim they're Central European, but in my mind, all the ones I listed are Eastern European countries. I'd even include Turkey and Kazakhstan in there, part of the latter is to the West of the Urals, which is what we normally consider the border between Europe and Asia.

ardit33 10/29/2025||
That's cute. It is clear thats an outdated political organization and not geographical. Read at the groupings. Greece is more eastern than Albania (and it is one hour off timezone), and it says 'western' which is not the case by any geographic means.

https://www.researchgate.net/publication/382295560/figure/fi...

https://www.worldatlas.com/r/w960-q80/upload/03/90/9b/countr...

Albania is clearly south east europe.

And, I don't care about your random Romanian friend's anecdote.

NicuCalcea 10/29/2025||
> Albania is clearly south east europe.

Yes, it is clearly south east Europe. East.

> And, I don't care about your random Romanian friend's anecdote.

Who's my friend?

rich_sasha 10/28/2025||||
Latvian and Lithuanian are not at all Slavic.

There is a branch that contains both Baltic and Slavic languages, but there's also one that contains Albanian and Greek.

ardit33 10/28/2025||
Albanian and Greek are both completely separate branches, and both unique on the tree (they don't have common cousins like the others).

There have been some attempts to tie Albanian to Germanic, or Greek, or other branches, but they all have failed.

At some point they all are Indo_european, but they split a way ago.

d1sxeyes 10/28/2025||||
Hungarian too, although there’s a question about whether Hungary is Eastern or Central Europe.
dragonwriter 10/28/2025|||
“There’s a question” implies that there is a ground truth that might be discovered to resolve this rather than simply a clash of different purely arbitrary definitions of the same terms.
d1sxeyes 10/29/2025||
Not my intention. In fact my intention was the opposite: just to highlight it’s a bit of a contentious topic.
lo_zamoyski 10/28/2025||||
The Visegrad 4 (Poland, Czechia, Slovakia, Hungary)are generally taken to be "Central European". The strict East/West division is largely a product of the Cold War and the Iron Curtain.
1718627440 11/1/2025|||
No, the distinction into West/Central/East Europe was also relevant in the centuries prior. You're right with, that East Europe starts with Belarus, Russia and Ukraine.
lo_zamoyski 11/5/2025||
> No, the distinction into West/Central/East Europe was also relevant in the centuries prior.

I never said it wasn't.

1718627440 11/6/2025||
> The strict East/West division is largely a product of the Cold War and the Iron Curtain.
d1sxeyes 10/29/2025|||
Perhaps. The UN still calls them “Eastern European” though.
kreetx 10/28/2025|||
Ah, yes, how could I forget! As a side note, though also Finno-Ugric then similarity in sound and appearance from Finnish or Estonian at least appears very far.
d1sxeyes 10/29/2025||
Yeah Hungarian is just a thing on its own.
pqtyw 10/28/2025|||
> most of Eastern Europe are Balto-Slavic

and

> only Estonian is not a Slavic language.

So following this logic saying "in Eastern Europe, only Estonian is not a Baltic language" would make as much sense?

cyfex 10/28/2025|||
> Greek being the only Hellenic one

Are there really any other Hellenic languages besides Greek?

skissane 10/29/2025||
Cappadocian Greek is Greek heavily mixed with Turkish, to the extent that is arguably better viewed as a distinct Hellenic language rather than just a nonstandard Greek dialect. However, around a century ago, most Greek speakers were expelled from Turkey and deported to Greece (and the same happened in reverse, most Turkish-speakers in Greece were deported to Turkey), including almost all Cappadocian-speakers - and they and their descendants largely switched to standard modern Greek - with the result that it was long believed that Cappadocian had died out in the 1960s, although more recently it has been discovered that there remain small populations of Cappadocians in rural Greece keeping the language alive.
sva_ 10/28/2025|||
Seems like the model isn't limited to those though, from the paper:

> as well as some additional relevant languages (Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian).

https://arxiv.org/pdf/2409.16235

The paper also goes into detail on training set sources, which I feel like a curation thereof might be considered the main contribution of this publication?

ChrisMarshallNY 10/28/2025|||
Flemish? I remember watching a TV show in Flemish (Hotel Beau Séjour[0]), so it's prevalent enough to invest that kind of money into.

What about Basque? Is that too controversial?

[0] https://en.wikipedia.org/wiki/Hotel_Beau_Séjour

yvdriess 10/28/2025|||
Flemish is more of a political construct than linguistic, it's a grouping of belgian-dutch the coastal, brabant and limburg language groups with each having their own regional dialects.
OptionOfT 10/28/2025||
It's more than political. In speaking Flemish is to Dutch as UK English is to US English. In writing however there is no difference in spelling, but there is a difference in word choice.

Now, being from Belgium, even within that small part of the country where everybody is supposed to speak Dutch, I genuinely don't understand people from near the coast, which was about 150 miles from where I used to live.

yvdriess 10/29/2025|||
Well yes, the dialects are very distinct linguistically, but what is often referred to as Flemish is the Dutch "tussentaal" aka "verkavelingsvlaams"[1]. That's not really a language per se, it's a regiolect of the official Dutch language, itself a Dutch variant of the Brabant dialects. The Flemish Dutch is usually used a lingua franca because the official Dutch otherwise sounds too formal (and native dialect speakers are foreign language speakers of Dutch). If I was to nominate a regional language for recognition it would more be the regional dialects like Brugs, Gents, Antwerps, Brussels Vloms, Hasselts, etc.

What I find interesting is that the differences in Flemish dialects make them much more distinct than what would normally call dialects. There are significant grammatical difference beyond the usual vocabulary differences. For instance, coastal Flemish conjugates yes and no[1], Limburgisch is a tonal language.

[1] https://nl.wikipedia.org/wiki/Tussentaal

adastra22 10/28/2025|||
Are you saying UK and US speak a different language? No one doubts it’s a different dialect, but we are talking about languages.
funac 10/29/2025||
that is a political distinction
adastra22 10/29/2025||
Mutual intelligibility is not a political distinction.
mytailorisrich 10/28/2025||||
I think those 24 languages reflect all the languages that are official languages at country level.

So for instance, Basque is not an official language of any country (only French in France and Spanish/Castilian in Spain). Belgium's official languages are French, Dutch, and German, "Flemish" is only a local variant of Dutch (Belgian French is also only a local variant of French).

contravariant 10/28/2025|||
Official is a weird concept though. Turns out Dutch law never really bothered to define an official language, Dutch simply is the de facto standard and is required for a lot of things making it effectively the standard. This makes Dutch Sign Language the only language officially recognised by law. An attempt to recognise Frysian and Dutch as official languages in the constitution failed.
rags2riches 10/28/2025|||
Sweden didn't have an "official" language before the Language Law of 2009. Five minority languages (Finnish, Meänkieli, Romani, Sámi, Yiddish) were officially recognized as such since 1999.
adastra22 10/28/2025|||
Same situation in USA, believe it or not.
ChrisMarshallNY 10/28/2025||||
Thanks. That makes sense.

In the US, people will resort to fisticuffs, over variants of Spanish. I usually translate into Castilian Spanish, because that seems to be the equivalent of "Vanilla" Spanish. No one is really happy (except the Spaniards), but I'm not accused of favoritism.

trollbridge 10/29/2025||
For what it’s worth, Castilian sounds very odd to American ears. For a good time you can ask «¿en castellano?» and be met with either a blank stare or laughter.
tirant 10/28/2025|||
Basque is an official language and declared as such in the Spanish constitution however restricted only to the regions that decide to apply it (Basque Country and Navarra).
mytailorisrich 10/28/2025||
If we want to go all legal, I believe that Spanish/Castilian is the only official language of the State, so at country level, with the other "Spanish languages" only official in their respective areas:

Section 3

(1) Castilian is the official Spanish language of the State. All Spaniards have the duty to know it and the right to use it.

(2) The other Spanish languages shall also be official in the respective Autonomous Communities in accordance with their Statutes.

(3) The richness of the different linguistic modalities of Spain is a cultural heritage which shall be specially respected and protected. [1]

[1] https://www.senado.es/web/conocersenado/normas/constitucion/...

tirant 10/28/2025||||
Basque is not controversial, but spoken just by very little people.
embedding-shape 10/28/2025||
Not sure that should be the qualifier, there might be more people able to speak Basque in the world than Danish, doesn't stop Danish from being well supported.
Levitz 10/28/2025||
Quick google points to about 1M Basque speakers in the EU against 5-6M Danish speakers, there's also the fact that Basque is not the only official language in the country it belongs to, and that it's in fact not spoken in the vast majority of the country.

From https://european-union.europa.eu/principles-countries-histor... we can find an excerpt relating to the policy and its purpose:

>One of the EU’s founding principles is multilingualism.

>This policy aims to:

>communicating with its citizens in their own languages

>protecting Europe’s rich linguistic diversity

>promoting language learning in Europe

With this in mind, the first intention fails by an enormous margin, given that 95%+ of Spain doesn't speak an iota of Basque, the second is met handily, given the long history of the language, and I'm not sure what to think about the third, any language whatsoever would serve that purpose.

adastra22 10/28/2025||
Irish would have been a better comparison. More speakers of Basque in Spain than Irish in Ireland.
td540 10/28/2025|||
like British English vs US English, Flemish is a dialect of dutch
unscaled 10/29/2025|||
I think sentence should be easily readable to Flemish speaker: "A shprakh iz a dialekt mit an armey un flot"

https://en.wikipedia.org/wiki/A_language_is_a_dialect_with_a...

2000UltraDeluxe 10/29/2025||
I think that sentence is easily readable for most people who speak at least one Germanic language.

It's mostly true, though, even if it's a somewhat simplified view.

ChrisMarshallNY 10/28/2025|||
Ah. That makes sense.

It's all Greek, to me...

adastra22 10/28/2025||
No, that’s the other side of the continent.
ChrisMarshallNY 10/29/2025||
I know. It was a joke.

Old-fashioned one.

Get off my lawn...

amarant 10/28/2025|||
I find it interesting that Norwegian isn't on the list.

I have often joked that Norwegian is just a dialect of Swedish, but I never expected to get official validation like this!

rcbdev 10/28/2025|||
Norwegian is not on this list, because in fact no country with Norwegian as their national language is part of the European Union at the time of writing.
2000UltraDeluxe 10/29/2025||||
"Norwegian" isn't just one unified language and Norway isn't in the EU.

That being said, the Scandinavian languages all come from old Norse, and modern national constructs aside, most of the people in the those areas descend from the same mix of Germanic tribes. There's no denying that modern-day Danish, Norwegian and Swedish are very similar.

emil-lp 10/28/2025||||
Norway isn't in EU, though.
bdhtu 10/28/2025||||
Norway isn't in the EU.
_kidlike 10/28/2025|||
In Greek we call our language Hellenic, and our country Hellas. "Greek" / "Greece" don't exist in the Hellenic language.
ranadomo 10/28/2025|||
> Γραικοί, Graikoí were an ancient Hellenic tribe

https://en.wikipedia.org/wiki/Graecians

3836293648 10/28/2025|||
Yes it does, it was a greek colony off the southern coast of Italy, which were the primary greek connection to the romans which how the name stuck.
adastra22 10/28/2025||
Much like the many names for Germany.
fsckboy 10/28/2025|||
Is Ireland the only country to bring in two languages, Irish/Gaelic and English? Is English an official language of any other EU countries?
layer8 10/28/2025|||
English is an official EU language because Regulation 1 Article 1 says so [0] and hasn’t been changed. In practice, English is the most widely used language in EU institutions, so it would be have been silly to remove it after Brexit.

[0] https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:01...

rcbdev 10/28/2025|||
It's a national language in Malta, making it a popular destination for "language weeks" in European schools, where English is usually a main subject.
layer8 10/29/2025||
That’s not why it is an official EU language, however.
ChocolateGod 10/28/2025||||
English at this point has stopped culturally belonging to the United Kingdom and whilst one can discus it's not so very moral way of getting there, it's become the bridge language for people of different languages to communicate in, further solidified by the internet.
raattgift 10/28/2025|||
That said, whenever there is a language selection UI (e.g. at banking machines or institutional websites) in wider Europe that uses flags to represent languages -- probably not a good idea to start with, but very common -- the Irish tricolour should be used to indicate English rather than the UK or USA flags. (although cf Airteagal 8 of Bunreacht na hÉireann).
layer8 10/29/2025||
Relatively few people would recognize that it’s meant to stand for English (including myself), so I’m unconvinced that this would be better.
raattgift 11/1/2025||
Cool, so the European Union and overlapping institutions could see this as an opportunity to promote greater public knowledge about one of their respective member states. Seems like an argument in favour of encouraging the display of a member state's flag rather than that of a non-member-state or former member state (especially given that state's history with respect to Ireland).

Using flags alone is already poor UI since there are many languages which spill across the borders into multiple member states and non-member states, and some member states with multiple official and commonly spoken languages.

But a menu item that reads: [Irish flag] (English) like one that reads [Swedish flag] (Svenska) does not seem worse than the legacy use of the UK flag or the popular use of the US one.

adzm 11/3/2025||
This is how we end up with a white box 'flag' that says 'en'
JAlexoid 10/28/2025||||
I believe Malta has English as an official language.

PS: Gaelic is a more general term for Irish and Scottish. Ireland brings specifically Irish(Gaeilge in Irish) language.

rags2riches 10/28/2025||||
Malta has Maltese and English as official languages. I don't know what they bring to the EU list of official languages.
ginko 10/28/2025|||
AFAIK Ireland only listed Gaelic as their official language with UK having English. That caused a bit of a problem during Brexit since technically English wasn't officially an EU language anymore. I guess they resolved it somehow.
rat87 10/28/2025|||
Why Italic as opposed to Romantic/Latin? I don't think there are any surviving not Latin branches of the Italic family are there?
ks2048 10/28/2025|||
From other comments, it seems many people don't realize that there are 11 more languages than these 24 official (this is mentioned in the paper):

Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.

jll29 10/28/2025||
+1
zhengiszen 10/28/2025|||
Maltese is derived from dialectical arabic
Qem 10/29/2025|||
What about Basque, is it not included?
ranc1d 10/29/2025||
Basque doesn't seem to be but Catalan, Galician are

https://huggingface.co/utter-project/EuroLLM-9B

jenadine 10/28/2025|||
No Luxembourgish?
punnerud 10/28/2025|||
Norwegian is also included, based on the model card: https://huggingface.co/utter-project/EuroLLM-9B
unscaled 10/29/2025||
Considering there are two different official written forms of Norwegian, that's not really saying enough, but I guess they mean Bokmål.
threesmegiste 10/28/2025||
Turkish?
runarberg 10/28/2025||
Is official in Northern Cyprus. But as I understand it while the whole island of Cyprus is in the EU, the state of Northern Cyprus isn’t.
Stagnant 10/28/2025||
Title is missing "(2024)". The 9B model was released last december[0].

0: https://sites.google.com/view/eurollm/home

htrp 10/28/2025||
>The EuroLLM Team brings together some of the brightest minds in AI including Unbabel, Instituto Tecnico Lisbon, the University of Edinburgh, Instituto de Telecommunicacoes, Université Paris-Saclay, Aveni, Sorbonne University, Naver Labs, and the University of Amsterdam.

>Europe is the only continent in the world to have a large public network of supercomputers that are managed by the EuroHPC Joint Undertaking (EuroHPC JU). As soon as we received the EuroHPC JU access to the supercomputer, we were ready to roll up our sleeves and get to work. We developed the small model right away and in less than 6 months the second model was ready.

[1] https://www.eurohpc-ju.europa.eu/eurohpc-success-story-speak...

Repurposing some of that physics sim compute

biohazard2 10/29/2025||
>Europe is the only continent in the world to have a large public network of supercomputers that are managed by the EuroHPC Joint Undertaking (EuroHPC JU).

Who would have thought that Europe is the only continent to have a network of supercomputers managed by Europe⸮

blitzar 10/29/2025||
This is the extent of the moat.
loandbehold 10/28/2025||
Aren't all frontier models already able to use all these languages? Support for specific languages doesn't need to be built in, LLMs support all languages because they are trained on multilingual data.
melvinmelih 10/28/2025||
> because they are trained on multilingual data

But they were not trained on government-sanctioned homegrown EU data.

sunaookami 10/28/2025|||
Who in their right mind would use this?
tensor 10/28/2025||
I'd use a model trained on a targeted and curated data set over one trained on all the crap on the internet any day.
loandbehold 10/28/2025|||
I keep hearing that LLMs are trained on "Internet crap" but is it true? For instance we know from Anthropic copyright case that they scanned millions of books to make a training set. They certainly use Internet content for training but I'm sure it's curated to a large degree. They don't just scrap random pages and feed into LLM.
airspresso 10/28/2025|||
> I keep hearing that LLMs are trained on "Internet crap" but is it true?

Karpathy repeated this in a recent interview [0], that if you'd look at random samples in the pretraining set you'd mostly see a lot of garbage text. And that it's very surprising it works at all.

The labs have focused a lot more on finetuning (posttraining) and RL lately, and from my understanding that's where all the desirable properties of an LLM are trained into it. Pretraining just teaches the LLM the semantic relations it needs as the foundation for finetuning to work.

[0]: https://www.dwarkesh.com/p/andrej-karpathy

ACCount37 10/29/2025||
"Just" is the wrong way to put it.

Pretraining teaches LLMs everything. SFT and RL is about putting that "everything" into useful configurations and gluing it together so that it works better.

nutjob2 10/28/2025||||
> I'm sure it's curated to a large degree. They don't just scrap random pages and feed into LLM.

How would they curate it on that scale? Does page ranking (popularity) produce interesting pages for this purpose? I'm skeptical.

ACCount37 10/29/2025|||
It is true. Datasets are somewhat cleaned, but only somewhat. When you have terabytes worth of text, there's only so much cleaning you can do economically.
sunaookami 10/29/2025|||
We are talking about government-curated data here, the bias should be obvious. Popular LLMs still have huge bias problems but it would be way worse with only government-curated data.
saretup 10/28/2025||||
The entirety of the internet vs government-sanctioned homegrown EU data.
tonyhart7 10/28/2025||||
"But they were not trained on government-sanctioned homegrown EU data."

ok what are you implying on this

mock-possum 10/29/2025||
Sidesteps potential legal issues probably
raverbashing 10/28/2025|||
> But they were not trained on government-sanctioned homegrown EU data.

If none of the LLM makers used the very big corpus of EU multilingual data I have an EU regulation bridge to sell it to you

tensor 10/28/2025|||
No, that's not how training works. It's not just about having an example in a given language, but also how many examples and the ratio of examples compared to other languages. English hugely eclipses any other language on most US models and that's why performance on other languages is subpar compared to performance on english.
Byamarro 10/28/2025|||
There's actually a research showing that llms are more accurate when questions are in Polish: https://arxiv.org/pdf/2503.01996
megous 10/29/2025||
My first impulse is to say that some languages have better SNR on the internet. (less garbage autogenerated or SEO content compared to useful information)
andy12_ 10/28/2025||||
I have never noticed any major difference in performance of ChatGPT between English and Spanish. The truth is that as long as the amount of training data of a given language is above some threshold, knowledge transfers between languages.
FinnKuhn 10/29/2025||
The issue starts, when an LLM's transfers knowledge between languages, even though that knowledge is not correct in that language. I have seen this with e.g. ChatGPT answers regarding laws for example where it refers to US laws when asked in German, which are obviously not relevant.
dragonwriter 10/29/2025||
> The issue starts, when an LLM's transfers knowledge between languages, even though that knowledge is not correct in that language. I have seen this with e.g. ChatGPT answers regarding laws for example where it refers to US laws when asked in German, which are obviously not relevant.

There is no necessary correlation between language and the correct set of laws to reference. The language of the question (or the answer, if for some reason they are not the same) is an orthogonal issue to the intended scope. There is no reason US laws couldn't be the relevant to a question asked in German (and, conversely, no reason US laws couldn't be wrong for a question asked in English, even if it was specifically and distinguishably US English.)

FinnKuhn 10/29/2025||
When you ask an LLM (in German) without further clarifying your location I expect it to refer to German (or Austrian/Swiss) laws.

For most questions it does this pretty well (e.g. asking for the legal age to drink). However once the answer becomes more complex it starts to halucinate very quickly. The fact that some of the hallucinations are just translated US laws makes me think that the knowledge transfer between languages is probably not helping in instances like this.

voxgen 10/28/2025|||
Ratio/quantity is important, but quality is even more so.

In recent LLMs, filtered internet text is at the low end of the quality spectrum. The higher end is curated scientific papers, synthetic and rephrased text, RLHF conversations, reasoning CoTs, etc. English/Chinese/Python/JavaScript dominate here.

The issue is that when there's a difference in training data quality between languages, LLMs likely associate that difference with the languages if not explicitly compensated for.

IMO it would be far more impactful to generate and publish high-quality data for minority languages for current model trainers, than to train new models that are simply enriched with a higher percentage of low-quality internet scrapings for the languages.

charlieyu1 10/28/2025|||
Training is a very different thing. Can’t speak for European, but LLMs are often much worse in Japanese because tokenisation used Unicode and a single Japanese character often has to be represented by more than one token
unscaled 10/29/2025||
I think you meant to say that tokenization is usually done with UTF-8 and a single Japanese character generally takes 3 or more code units (i.e. bytes). Unicode itself is not the culprit (in fact, even with UTF-16 tokenization, most Japanese characters would fit in a single code unit, and the ones that won't are exceedingly rare).

I have to admit I have not encountered significant mistokenization issues in Japanese, but I'm not using it on a daily basis LLMs. I'm somewhat dobutful this can be a major issue, since frontier LLMs are absolutely in love with Emoji, and Emoji requires at least 4 UTF-8 bytes, while most Japanese characters are happy with just 3 bytes.

intended 10/28/2025|||
Nope. Capability begins to degrade once you move away from english.

Plus all your T&S/AI Safety is not solved with translation, you need lexicons and data sets of examples.

Like, people use someone in Malaysia, to label the Arabic spoken by someone playing a video game in Doha - the cultural context is missing.

The best proxy to show the degree of lopsidedness was from this : https://cdt.org/insights/lost-in-translation-large-language-...

Which in turn had to base it on this: https://stats.aclrollingreview.org/submissions/linguistic-di...

From what I am aware of, LLM capability degrades once you move out of English, and many nation states are either building, or considering the option of building their own LLMs.

numpad0 10/28/2025|||
Not natively, they all sound translated in languages other than English. I occasionally come across French people complaining about LLMs' use of non-idiomatic French, but it's probably not a French problem at all, considering that this effort includes so many Indo-European languages.
FinnKuhn 10/28/2025||
I can at least also confirm this for German. Here is one example that is quite annyoing:

Chat GPT for example tends to start emails with "ich hoffe, es geht dir gut!", which means "I hope you are well!". In English (especially American) corporate emails this is a really common way to start an email. In German it is not as "how are you" isn't a common phrase used here.

ideasarecool 10/29/2025|||
Term support is vague. Can you do basic interaction in most other languages? Sure. Is it anywhere close to competence it has in english? No. Most models seem to just translate english responses at beginners simplistic monotone level.
whazor 10/28/2025|||
European governments have huge collections of digitalised books, research, public data.

But also European culture could maybe make a difference? You can already see big differences between Grok and ChatGPT in terms of values.

pembrook 10/28/2025||
If it's publicly available data, books and research, I can assure you the big models have already all been trained on it.

European culture is already embedded in all the models, unless the people involved in this project have some hidden trove of private data that they're training on which diverges drastically from things Europeans have published publicly (I'm 99.9% positive they don't...especially given Europe's alarmist attitude around anything related to data).

I think people don't understand a huge percentage of the employees at OpenAI, Anthropic, etc. are non-US born.

lm28469 10/28/2025|||
Meh, it depends a lot on the dataset, which are heavily skewed towards the main languages. For example they almost always confuse Czech and Slovak and often swap one for the other in middle of chats
mirekrusin 10/28/2025|||
But the only way to unskew it is to remove main language data because there isn't really any to add, no?
tensor 10/28/2025||
You can also correctly bias your sampling so that when selecting new training instances each language is chosen equally. Generally the diversity of data is good, unless that data is "wrong" which, ironically, is probably most of the internet, but I digress.
RobotToaster 10/28/2025|||
Aren't they about as different as American English and British English?
svobodovic 10/28/2025||
The difference ia larger than let's say just a "dialect". They really are different languages, even though we generally understand each other quite well (younger generations less so). I've heard it's about as different as e. g. Danish and Swedish - not sure if that comparison is helpful.
adt 10/28/2025||
The EuroLLM-9B model release is from Dec/2024, and scores just above random chance for benchmarks like MMLU-Pro (17.6%, random chance is 10%).

Comparison with similar EU models + 600 other highlights:

https://lifearchitect.ai/models-table/

hebejebelus 10/28/2025||
Some cursory clicking about didn't reveal to me the actual corpus they used, only that it is several trillion tokens 'divided across the languages'. I'm curious mainly because Irish (among some other similarly endangered languages on the list) typically has any large corpus come from legal/governmental texts that are required to be translated. There must surely be only a relatively tiny amount of colloquial Irish in the corpus. It be interesting to see some evals in each language particularly with native speakers.

I think LLMs may be on the whole very positive for endangered languages such as Irish, but before it becomes positive I think there's an amount of danger to be navigated (see Scots Gaelic wikipedia drama for example)

In any case I think this is a great initiative.

tadzikpk 10/29/2025||
That tracks. I learnt Gaeilge Uladh growing up and standard Irish feels like reading or writing a legal agreement compared to the spoken word…
Timwi 10/29/2025||
Can you provide a link about the “Scots Gaelic Wikipedia drama” you reference? I've heard of drama related to the Scots Wikipedia but that has nothing to do with Gaelic.
hebejebelus 10/30/2025||
My apologies, it was the Scots Wikipedia, careless of me. Link for context: https://en.wikipedia.org/wiki/Scots_Wikipedia#Controversy
srameshc 10/28/2025||
I was thinking the same, why are so many superior models coming from only countries like US and China. And why are European countries not in the list other than France with Mistral. Why are so few companies in India, Japan, South Korea even close to a promising new model like what Chinese companies did ?
nonethewiser 10/28/2025||
"Why" is a fair question but are you surprised? Europe is consistently behind in tech.

Europe has about 1.3 times the population of the USA and about 75% of the GDP yet EU tech output is a very small percentage of US tech output. We are not talking about 70, 50, 30, or even 20%. It's a drop in the bucket.

>The seven largest U.S. tech companies, Alphabet (Google), Amazon, Apple, Meta, Microsoft, Nvidia, and Tesla, are 20 times bigger than Europe’s seven largest, and generate 10 times more revenue.

https://eqtgroup.com/thinq/technology/why-is-europes-tech-in...

"Why" is a good question, but I definitely wouldnt expect significant competition in LLMs from Europe based on the giant tech disparity. Having 1 non-cutting edge model that isn't really competitive is pretty much what I would expect.

InsideOutSanta 10/28/2025|||
> The seven largest U.S. tech companies (...) are 20 times bigger than Europe’s seven largest, and generate 10 times more revenue.

I'm going to guess that this part is intentional. Europe tends to be more aggressive in enforcing antitrust laws. Economically, Europe's goal isn't to have the biggest companies but to have more smaller companies.

So you're not going to get companies like Google, but you will get companies like Proton, Spotify, Tuta, Hetzner, Mistral, Threema, Filen, Babbel, Nextcloud, CryptPad, DeepL, Vivaldi, and so on.

nonethewiser 10/28/2025|||
>I'm going to guess that this part is intentional. Europe tends to be more aggressive in enforcing antitrust laws. Economically, Europe's goal isn't to have the biggest companies but to have more smaller companies.

So is your hypothesis that the total market cap of EU tech companies is something like 50,60,70, etc. % of total US tech marketcap? Something significantly different than the ~10% implied by that figure (largest us companies 10x largest EU companies). And it's just more broadly distributed?

Hard to find data on this but this is showing EU tech market cap at 3.2T. https://www.stateofeuropeantech.com/chapters/outcomes

Whereas this is saying the US "megacaps" ($200B+) are at 21T. https://www.cnbc.com/2025/09/05/tech-megacaps-worth-market-c...

Which puts the entire EU tech market at 15% of the US megacaps. Not even the entire market.

layer8 10/28/2025||
European companies are smaller on average and less likely to go public in general, so market cap comparisons don’t show the whole picture. Growing big is less often seen as a goal than in the US. “Megacaps” aren’t necessarily considered a healthy thing to have.
jimbokun 10/28/2025||
Yes, and this all but guarantees that Europe will stay behind USA and China in their technology capabilities.
mjburgess 10/28/2025|||
What are these capabilities?

I don't see any sense in which the EU has fewer capabilities. It has, say, a smaller number of businesses with smaller market dominance.

It isnt clear to me what capability the EU would gain by having a monopolist social network, a monopolist search engine, a monopolist advertising trader

jimbokun 10/28/2025||
Europe has all of those things, they just come from the US.
layer8 10/29/2025|||
If it means sticking to their values, then so be it.
panick21_ 10/29/2025|||
Antitrust laws are not the reasons for more smaller companies. Getting an antitrust case off in Europe is very hard, not US hard but still hard. The reasons are more complex then that.
emporas 10/28/2025|||
Also, commercial software is consistently behind from open source.

I only use open source LLMs for writing (Qwen 32b from Groq) and open source editor of course, Emacs.

If some people can write better using commercial LLMs (and commercial editors), by all means, but they put themselves at a disadvantage.

Next step for me, is to use something open source for translation, I use Claude for the moment, and open source for programming, I use GPT curently. In less than a year I will find a satisfying solution to both of these problems. I haven't looked deep enough.

neoromantique 10/28/2025||
What a weird comment.

llama-3.1-70b-versatile is pretty good at translating though

sublimefire 10/28/2025|||
As a European citizen I think it boils down to access to the capital. EU/EEA is not a country and the market is sort of fragmented. The big players are UK, France, Germany, everyone else does not have the same access to money as say in the US. Folks want to do it but there is a glass ceiling. Hence you have these collabs among large institutions to tap into funds such as from Horizon which are academic in nature and do not translate well into products.
izacus 10/29/2025||
The fun part is that people whining about not being able to raise common capital and operate across whole EU due to regulation tend to also be the most rabid opposers of any kind of common regulation that would bring whole EU into alignment and make it a less fragmented market.
loandbehold 10/28/2025|||
Because training frontier model is expensive and only US and China have capital structure to raise tens of billions of dollars to do it.
lossolo 10/28/2025|||
You can easily fit below 10 billion for the whole datacenter, then you only pay for electricity + maintenance + staff. 100k GPUs cost a few billion USD, that's more than enough to train frontier models, run experiments, and serve models in the EU to start. Look at what xAI did and how much it cost them and it's more expensive to do in US than in EU.
busssard 10/28/2025|||
being able to train new frontier models is the new equivalent to nuclear capabilities.

i predict at some point countries will get CIA'ed when they publish plans to build a large data center.

Similar to the time when they got CIA'ed when announcing plans for new nuclear plants.

henriquenunez 10/28/2025||
They are already CIA'ed on a regular basis for much less than that.
sunaookami 10/28/2025|||
EU made a >900 page law about AI and patted themselves on the back for being "the first to regulate AI" (which was not even true, China had an AI law before and it's two pages long).
sajithdilshan 10/28/2025||
This cannot be stressed enough. In my experience working in multiple tech startups in Germany, the power compliance, legal and all other 2nd line has over engineering is quite immense. Most of the time they act as a hindrance for innovation rather than a supporting factor.

This AI law is a clear example of that. Pencil pushers creating more obstacles for the sake of creating more obstacles rather than actually taking a pragmatic approach.

isodev 10/28/2025||
It's strange, my real life experience is very different than yours. Unless you're training AI to do something shady, it's really no bother at all. In fact, most of what the AI Act requires, you have to do anyway for a good model card.
airspresso 10/28/2025||
I agree. And I also know how much of that experience comes from having a legal dept. that are collaborative and supportive of what the tech org wants to do. Which I suspect is quite rare.
isodev 10/28/2025|||
Because the value of these models is (actually) yet to be proven. Why saturate the market with something that we already have at least one of and others are selling as a service? No model provider (including the "big ones" like OpenAI) has been able to produce a viable business case. They're all literally running on government deals and investor money.
mensetmanusman 10/30/2025||
It’s been proven that it is valuable to be able to convert English into executable code that does what is wanted.
apples_oranges 10/28/2025||
Does it even make sense? Just use the American or Chinese ones, adjust As needed. Where’s the point in spending millions to build The same thing or worse
t43562 10/28/2025||
Now that the big bets have been made, who wants to try to compete with them?
sireat 10/28/2025||
It is interesting how much traction this 9B model is getting which is good.

Still two month earlier 19 European language model with 30B parameters got almost no mention:

https://huggingface.co/TildeAI/TildeOpen-30b

Mind you that is another open model that is begging for fine-tuning (it is not very good out of box).

jamesblonde 10/29/2025||
The leading European ECommerce Company, Zalando with 50m users, is now using the leading European AI platform, Hopsworks, to power their real-time AI. Zalando are Databricks largest EU customer, but they are using Hopsworks instead for operational AI.

You would never hear it, though, as European IT press only promotes SV startups

https://www.youtube.com/watch?v=u8QFiLhnuFg&feature=youtu.be

Disclaimer, i work at Hopsworks.

extraduder_ire 10/28/2025|
From the EuroLLM-9B page on hugginface;

>You need to agree to share your contact information to access this model

Is this common? I've never seen it on the site before, and it isn't on the smaller model. What are they collecting this information for?

ks2048 10/28/2025||
I'm not sure which models require this and why, but I've come across it. e.g. the llama models, https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
extraduder_ire 10/29/2025||
I'm less surprised about llama having that requirement with its custom license, but EuroLLM-9B is Apache 2.0.
airspresso 10/28/2025||
Yes, that's quite common.
More comments...