Meta got caught gaming AI benchmarks

Posted by pseudolus 4/8/2025

Meta got caught gaming AI benchmarks(www.theverge.com)

347 points | 161 commentspage 2

labrador 4/8/2025|

Meta does themselves a disservice by having such a crappy public facing AI for people to try (meta.ai). I regularly use the web versions for GPT 4o, Deepseek, Grok, and Google Gemeni 2.5.

Meta is always the worst so I don't even bother anymore.

openplatypus 4/9/2025||

Meta doing something dodgy OR unethical OR criminal ... and nobody is surprised.

goldchainposse 4/8/2025||

In other news, the head of AI research just left

https://www.cnbc.com/2025/04/01/metas-head-of-ai-research-an...

kridsdale3 4/8/2025||

I would have thought that title would belong to Yann.

brandall10 4/8/2025|||

It's a misnomer - the VP left. Yann is the Chief Scientist, which I imagine most would agree would be the 'head' of a research division.

yodsanklai 4/8/2025|||

TBH I'm very surprised Yann Le Cun is still there. He looks to me like a free thinker and an independent person. I don't think he buys into the Trump agenda and US nationalistic anti-Europe speech like Zuck does. He may be giving Zuck the benefit of the doubt, and probably is grateful that Zuck gave him a chance when nobody else did.

lenerdenator 4/8/2025||

> TBH I'm very surprised Yann Le Cun is still there. He looks to me like a free thinker and an independent person. I don't think he buys into the Trump agenda and US nationalistic anti-Europe speech like Zuck does. He may be giving Zuck the benefit of the doubt, and probably is grateful that Zuck gave him a chance when nobody else did.

Zuck doesn't buy it, either. He just knows what's good for business right now.

In an example of the worst person you know making a great point, Josh Hawley said "What really struck me is that they can read an election return." [0].

Though it's worth remembering, it's very difficult to accumulate the volume of data necessary to do the current kind of AI training while sticking to the strictest interpretations of EU privacy law. Social media companies aren't just feeding the user data into marketing algorithms, they're feeding them into AI models. If you're a leading researcher in that field - Like Le Cun - and the current state-of-the-art means getting as much data as possible, you might not appreciate the regulatory environment of the EU.

[0] https://www.npr.org/2025/02/27/nx-s1-5302712/senator-josh-ha...

nailer 4/8/2025||

A lower level employee also resigned specifically about this:

https://x.com/arjunaaqa/status/1909174905549042085?s=46

goldchainposse 4/8/2025||

I wonder how much the current work environment contributed to this. There's immense pressure to deliver, so it's not surprising to see this.

kittikitti 4/8/2025||

For me at least, the 10M context window is a big deal and as long as it's decent, I'm going to use it instead. I'm running Scout locally and my chat history can get very long. I'm very frustrated when the context window runs out. I haven't been able to fully test the context length but at least that one isn't fudged.

casey2 4/9/2025||

This feels like AI deniers grasping at straws. You have to prove the claim that "friendly" models do better in head to head user ratings. You also have to adequately define your claim which hasn't been done. And then you have to prove that by whatever definition of "friendliness" you've constructed made a significant difference in the benchmark.

What I suspect will happen is that someone will just tell the LLM to "be curt" that may cause it's score to drop and people will unthinkingly take that to mean the above nonsensical claims are true.

jerrygoyal 4/9/2025||

lmarena has lost credibility for me. Are there any better alternatives out there?

TheJCDenton 4/9/2025||

Last LLMs launches from top US AI labs were rather disappointing. Products seems under baked and mainly reactive... They oversold the capabilities of their future models when Deepseek landed to stay in the news at the time, but now that we look at the result it's not what everyone expected. And people are starting to question the hype, which is healthy but also very dangerous for these companies needing very large capital. It seems that some labs like Meta are unto something, but it's more research material for now than a short term product. Interesting times.

badmonster 4/9/2025|

wow, is that real?

More comments...