Top
Best
New

Posted by dworks 7/6/2025

Whistleblower: Huawei cloned Qwen and DeepSeek models, claimed as own(dilemmaworks.substack.com)
119 points | 58 commentspage 2
kkzz99 7/6/2025|
Remember that there was a Huawei Lab member that got fired for literally sabotaging training runs. Would not be surprised if that was him.
yorwba 7/6/2025|
I think the case you're talking about is this one: https://arstechnica.com/tech-policy/2024/10/bytedance-intern... where it was a ByteDance intern.
tengbretson 7/6/2025||
In the LLM intellectual property paradigm, I think this registers as a solid "Who cares?" level offence.
brookst 7/6/2025||
The point isn’t some moral outrage over IP, the point is a company may be falsely claiming to have expertise it does not have, which is meaningful to people who care about the market in general.
tonyedgecombe 7/6/2025|||
Nobody who pays attention to Huawei will be surprised. They have a track record of this sort of behaviour going right back to their early days.
npteljes 7/6/2025||
While true, these sorts of reports are the track records which we can base our assessments on.
some_random 7/6/2025|||
Claiming to care deeply about IP theft in the more nebulous case of model training datasets then dismissing the extremely concrete case of outright theft seems pretty indefensible to me.
Arainach 7/6/2025|||
Everyone has a finite amount of empathy, and I'm not going to waste any of mine on IP thieves complaining that someone stole their stolen IP from them.
some_random 7/8/2025|||
I'm not asking you to cry or even stifle a laugh, the only think I'm criticizing is an uneven application of claimed values.

Edit: Or an argument as to why this IP theft is fine while that used in training isn't. I'm sure some of that training data was CC-SA licensed for instance ;)

mensetmanusman 7/6/2025|||
It’s theft in the way taking a picture of nature that you had nothing to do with is theft.
Arainach 7/6/2025||
This line of argument was worn out and tired when 14 year olds on Napster were parroting it in 1999.
pton_xd 7/6/2025||||
> dismissing the extremely concrete case of outright theft seems pretty indefensible to me.

Outright theft is a meaningless term here. The new rules are different.

The AI space is built on "traditionally" bad faith actions. Misappropriation of IP by using pirated content and ignoring source code licenses. Borderline malicious website scraping. Recitation of data without attribution. Copying model code / artifacts / weights is just the next most convenient course of action. And really, who cares? The ethical operating standards of the industry have been established.

perching_aix 7/6/2025|||
Par for the course for emotional thinking, I'm not even surprised anymore.
didibus 7/6/2025|||
Ya, the models have stolen everyone's copyrighted intellectual property already. Not sure I have a lot of sympathy, in fact, the more the merrier, if we're going to brush off that they're all trained on copyrighted material, might as well make sure they end up a really cheap, competitive, low margin, accessible commodity.
lambdasquirrel 7/6/2025||
Eh... you should read the article. It sounds like a pretty big deal.
didibus 7/6/2025||
I did read the article, appart for that it sounds like a terrible place to work, I'm not sure I see what's the big deal?

No one knows how any of the models got made, their training data is kept secret, we don't know what it contains, and so on. I'm also pretty sure a few of the main models poached each others employees which just reimplemented the same training models with some twists.

Most LLMs are also based on initial research papers where most of the discovery and innovation took place.

And in the very end, it's all trained on data that very few people agreed or intended would be used for this purpose, and for which they all won't see a dime.

So why not wrap and rewrap models and resell them, and let it all compete for who offers the cheapest plan or per-token cost?

esskay 7/6/2025|||
It is very hard to have any sympathy, they stole stolen material from people known to not care they are stealing.
mathverse 7/6/2025||
[flagged]
oblio 7/6/2025||
Didn't know Sam Altman was Chinese :-)
typon 7/6/2025||
LLMs are all built on stolen data. There is no such thing as intellectual property in LLMs.
mattnewton 7/6/2025||
That’s not the point IMO; the point was this was being used to display capabilities to train models with Huawei software and hardware.
mensetmanusman 7/6/2025||
/robots that read books in the library are stealing/
hereme888 7/6/2025||
[flagged]
jambutters 7/6/2025|
I don't think anyone cares about that. OpenAI ripped off of the internet and books. Deepseek distilled some of openAI and pushed the field forward
mystraline 7/6/2025|
[flagged]
knowitnone 7/6/2025|
"Google cloned Linux kernel, claimed as own." Link?
owebmaster 7/6/2025||
Android