Apertus – Open Foundation Model for Sovereign AI

Posted by T-A 1 day ago

Apertus – Open Foundation Model for Sovereign AI(apertvs.ai)

516 points | 172 commentspage 4

yashthakker 19 hours ago|

[flagged]

Ainaguade 23 hours ago||

[dead]

focusgroup0 23 hours ago||

[dead]

iamyemeth 13 hours ago||

> Conclusion There are 2 r's in the word "strawberry".

Not looking good so far

sigmoid10 13 hours ago|

I guess they still use a tokenizer? Why would this kind of issue be solved? The model fundamentally can't see the word character by character like you do. For o200k tokenizers for example, what the model sees are 3 tokens: [302, 1618, 19772]. These are shown to you as ["st", "raw", "berry"] in the UI. The only way any model can infer individual characters is by using external tools or implicit knowledge picked up during training or (what many of the big labs apparently do) special training for these edge cases that fail once the next special case comes along.

maxloh 1 day ago|

Great to see more fully open LLMs.

I think a problem with open-weight models is that while you can improve them, you are not going to create the next generation of LLMs by fine-tuning. We are at the mercy of frontier labs for access to SOTA LLMs. For example, Anthropic recently started requiring identity verification for Claude [0], same for OpenAI [1].

If one day China's distillation labs stop releasing their LLMs as open-weight, I doubt American labs will continue to release free LLM weights without that competition.

That's where fully open pipelines shine: they enable the community to create the next generation of SOTA LLMs. That is the only way LLMs truly become sovereign.

[0]: https://news.ycombinator.com/item?id=48618455

[1]: https://news.ycombinator.com/item?id=48618606

anon373839 1 day ago||

> China's distillation labs

This notion that Chinese labs are merely distilling frontier models is quite an unwarranted slur. Those labs have published WAY more useful research than US labs on RL techniques, novel model architectures, training pipelines, etc. They have also hit intelligence-per-parameter densities that US labs have yet to attain.

Apart from that, merely training a model on outputs from another model, off policy and without the logits, doesn’t really work that well.

The Chinese labs know how to build frontier level models. GLM-5.2 shows that they no longer even need Nvidia chips to do it.

trollbridge 22 hours ago|||

It's one of those lies people tell themselves to make themselves feel better. "Oh, they're just copying my stuff."

Chinese labs are basically just telling everyone, out in the open, what they're doing and how to do it, and the answer from American frontier labs is "Well, they couldn't possibly be getting the results they're getting without just distilling our models," and the American labs aren't even trying to do some of the stuff like DS's aggressive caching to get costs down.

Vaslo 23 hours ago||||

I recently watched a video for one of these “Chinese Models” it kept insisting it was Claude when the user asked. Sorry, there’s no “slur” here but legit suspicion.

c0rruptbytes 23 hours ago|||

https://blog.kilo.ai/p/did-claude-opus-48-distill-alibabas

it happens to all models…when the internet is increasingly generated, things happen

anon373839 22 hours ago|||

These anecdotes where someone gets the model to claim it is X model are meaningless. (Claude also has been known to claim it is Deepseek when asked in Chinese.)

trollbridge 22 hours ago||

As anyone who's tried to write an AGENTS.md that says "Place an Assisted-by: git trailer that contains the harness you're using:whatever model this is"; such a naive approach often results in a seemingly random model.

halJordan 23 hours ago|||

But have they? I understand that the Chinese side is illuminated and the American side is dark. I disagree that the Chinese labs have created anything that isn't in an American research lab or production dc. Sure the Chinese have published their findings and not for nothing. But are they novel? Unlikely imo

chriskanan 23 hours ago||

They are doing ta tremendous amount of novel research where American AI companies have "war rooms" to study their papers and models and American labs publish next to nothing. They have to often do more with less. As an AI researcher, Chinese labs are doing tremendous benefit to science whereas some American companies (and I'm American) seem to think only they are able to do AI research responsibility (I've been working on neural networks for 25+ years). I'm pretty sure Fable sabotaged my research codebase (see the news stories about this).

david_shi 20 hours ago||

Whoa, say more about Fable sabotaging your codebase?

dofm 1 day ago||

> We are at the mercy of frontier labs for access to SOTA LLMs

I disagree with this use of SOTA, and this topic is why.

Anthropic and OpenAI have “cutting-edge” models. These are beyond the state of the art but they are closed, secretive, hard to quantify.

The “state of the art” is open source, open weights models that can be inspected, studied, shared and critiqued, because that is what is meant by “the art” —- it is the knowledge and principles and evidence and materials available to all. The “state of the art” is the highest point of that.

I wish we could make this distinction and stop blessing two secretive, unverifiable loss-making companies with so much power.

(Putting that aside, I suspect — without evidence, mind you - that the endless march to solving models by making them bigger is not the solution anyway.)

MangoCoffee 22 hours ago|||

SOTA LLMs is less important than cheap token and Chinese AI labs is releasing model that is only about 6-8 months behind American AI labs.

Chinese's model like GLM is getting better for coding task and its cheaper. Microsoft Github copilot have to switch billing to token based. the cost of AI have increased since agent come into play. whoever can offer cheaper token to do task will win.

even Microsoft is looking into Deepseek for cheap token.

https://www.axios.com/2026/06/16/microsoft-copilot-cowork-to...

sockaddr 1 day ago|||

Sorry but I think you’re requirement that something only be “the art” if any arbitrary person can critique it is off. The frontier labs are working on the state of the art but it’s just art that you aren’t allowed to see. Unfortunately.

dofm 1 day ago|||

It is work using the principles of the art, obviously.

But "state of the art" implies the highest state of general availability, not just in terms of access to some product, but of use of the ideas, concepts, methodologies etc.

Anthropic and OpenAI have "cutting edge" models; the state of the art is behind the cutting edge.

The state of the art is the best open source, open weights model available. More or less by definition.

I am probably tilting at windmills here.

bnj 23 hours ago|||

I appreciate this distinction. The are multiple senses of SOTA and one that has been taking on greater mindshare is as a synonym of “the best available”. By rebasing on SOTA as generally available and understood versus cutting edge, which has limited distribution and leads the way, we expand the vocabulary we have available to describe what’s going on. Thanks.

toss1 22 hours ago|||

That's an interesting and possibly useful distinction , but it seems unique to you. Spreading it as "We should categorize the AIs this way" would be a good argument.

But the way SOTA is generally understood by other users of the language, it refers to exactly the team, technology, & techniques defining the cutting edge in any field, regardless of the whether the technology & techniques are available outside of that team...

dofm 12 hours ago||

Not so much, it turns out.

https://english.stackexchange.com/questions/239963/do-state-...

8note 1 day ago|||

the art is the standard engineering practices that go into building the thing

its things you would be trained in as part of a bachelor's degree and some graduate coursework