Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs (2024)

Posted by doener 4/15/2025

Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs (2024)(arxiv.org)

248 points | 95 commentspage 2

NetOpWibby 4/15/2025|

Upset that my mind went, "TEKKEN 7 LLM." Imagine Heihachi Mishima vibe-coding for you.

htrp 4/15/2025||

TIL there are european versions of ARC, HellaSwag, MMLU, and TruthfulQA.

smokel 4/15/2025||

A paper on languages that begins with a grammatical error in the first sentence does not inspire confidence:

> LLMs represents a disruptive technology

NitpickLawyer 4/15/2025||

Hey, at least it's not generated by chatgpt :D

Funny how LLMs now write cleaner than humans in most cases.

spacebanana7 4/15/2025||

I imagine there was a similar tipping point in the Industrial Revolution where machines started marking "better" manufactured items than artisans.

InsideOutSanta 4/15/2025||

Interestingly, we then collectively decided that, in many cases, imperfect artisanal things were better than perfect industrially produced things. So maybe people will start intentionally putting mistakes into their texts to prove they're not machines.

I'm already reluctant to use the em-dash correctly because so many people think only LLMs know how to use it.

Miraltar 4/15/2025|||

It's not that I think only LLMs know em-dashes but they abuse it so much I get annoyed everytime I see one

spacebanana7 4/15/2025||

LLMs seem to use them much more than normal people.

If I were a teacher marking homework, em-dashes would be at least an amber flag for LLM use.

croes 4/15/2025||

Given that it’s about non-English languages it is forgivable

YetAnotherNick 4/15/2025|

They compared with Llama 3.1 and found that to be better on average for their tasks like European MMLU. And Llama 3.1 is the worst in the batch with Qwen 2.5 and Gemma 3 being significantly better.