Top
Best
New

Posted by chabons 5 hours ago

Muse Spark: Scaling towards personal superintelligence(ai.meta.com)
https://meta.ai/
188 points | 243 commentspage 2
bguberfain 3 hours ago|
We all know it... but I think they were very bold in this warning about using your private messages to train public models. _Your messages with AIs will be used to improve AI at Meta. Don't share information, including sensitive topics, about others or yourself that you don't want the AI to retain and use_
discopicante 3 hours ago|
meta doesn't exactly instill confidence on using personal data responsibly. hard pass
throwaw12 4 hours ago||
How is that Meta spent so much money for talent and hardware, but the model barely matches Opus 4.6?

Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has

strulovich 4 hours ago||
Meta did a bunch of mistakes, and look like Zuckerberg spent a lot of money on talent and made big swings to change it (that happened about a year ago)

I think it’s unrealistic to expect them to come back from that pit to the top in one year, but I wouldn’t rule them out getting there with more time. That’s a possible future. They have the money and Zuckerberg’s drive at the helm. It can go a long way.

solenoid0937 4 hours ago|||
It's benchmaxxed.

If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.)

throwaw12 4 hours ago||
how do you know it's benchmaxxed?
solenoid0937 4 hours ago|||
Friends at Meta with access to the model + personal experience at Meta.

Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.

luma 3 hours ago||||
For one, they aren't using the latest version of many of the benchmarks. eg, ARC-AGI 2 and not 3, etc.
prodigycorp 4 hours ago|||
meta's benchmaxing tendencies are well known. llama4 was mega benchmaxxed, there's nothing that suggests to me that meta's culture has changed.
spindump8930 2 hours ago||
Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.
coffeebeqn 4 hours ago|||
Matching Opus 4.6 would be pretty good? It’s the SOTA actually available model
reissbaker 3 hours ago||
Muse Spark doesn't even match GLM-5.1 on most benchmarks. And GLM is open source!
impulser_ 4 hours ago|||
It's not even on par with Sonnet. It's on par with open source models and it not even open source and sit behind a private preview API.

Might as well not release anything.

CuriouslyC 1 hour ago|||
Anthropic has just been focused on coding/terminal work longer mostly, and their PRO tier model is coding focused, unlike the GPT and Gemini pro tier models which have been optimized for science.

Their whole "training the LLM to be a person" technique probably contributes to its pleasant conversational behavior, and making its refusals less annoying (GPT 5.2+ got obnoxiously aligned), and also a bit to its greater autonomy.

Overall they don't have any real moat, but they are more focused than their competition (and their marketing team is slaying).

zozbot234 56 minutes ago||
Autonomy for agentic workflows has nothing to do with "replying more like a person", you have to refine the model for it quite specifically. All the large players are trying to do that, it's not really specific to Anthropic. It may be true however that their higher focus on a "Constitutional AI"/RLAIF approach makes it a bit easier to align the model to desirable outcomes when acting agentically.
wotsdat 4 hours ago|||
[dead]
username223 4 hours ago|||
Facebook is working with the talent that can’t find a job at some other company. It doesn’t surprise me they ship mediocrity.
zozbot234 4 hours ago||
> has some secret sauce

Yup, it's called test-time compute. Mythos is described as plenty slower than Opus, enough to seriously annoy users trying to use it for quick-feedback-loop agentic work. It is most properly compared with GPT Pro, Gemini DeepThink or this latest model's "Contemplating" mode. Otherwise you're just not comparing like for like.

throwaw12 4 hours ago||
> it's called test-time compute.

Why can't others easily replicate it?

coder68 4 hours ago||
I have not delved into the theory yet but it seems that the smaller open-source models do this already to an extent. They have less parameters, but spend much more time/tokens reasoning, as a way to close the performance gap. If you look at "tokens per problem" on https://swe-rebench.com/ it seems to be the case at least.
ddp26 4 hours ago||
The second paragraph starts "Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts. To support further scaling, we are making strategic investments..."

This article is about Meta, not about the user. Who signs off on these? Is the intended audience other people at Meta, not the user?

tjkrusinski 4 hours ago||
The article is published primarily to signal to the market that Meta is serious in its efforts to compete in building frontier ai models.

They want to 1) attract talent, 2) tell wall street they can play in this space as well, 3) help employees feel the company is moving in the right direction.

A frontier LLM doesn't apply to their core consumer products.

Lihh27 4 hours ago||
the blog is the product. investor deck posted as a tech launch
conradkay 4 hours ago||
Stock up 9% today, very pleasant for Zuck if you do the math on his net worth :)
hungryhobbit 3 hours ago||
I mean, kinda? It's not like Zuck is selling his stock tomorrow, so daily fluctuations in stock price don't really affect him.
leumon 15 minutes ago||
pelican riding a bicycle (svg): https://files.catbox.moe/u5yc0x.png
hvass 2 hours ago||
Genuine question: Why release this the day after Mythos? It does not appear SOTA (just based on benchmarks). OpenAI will likely release Spud tomorrow.
paxys 4 minutes ago||
Mythos is a news article. This is an actual model you can use.
eranation 2 hours ago|||
That's a really good question, my sarcastic mind thinks that Anthropic rushed the Mythos announcement of fears of Meta stealing their thunder... (I guess someone leaked that, a LOT of anthropic folks are ex meta... so, you know)

Just a speculation, I have no real knowledge about it.

MattRix 12 minutes ago||
I think Anthropic did the mythos announcement to undercut OpenAI’s upcoming next model announcement, not Meta’s.
MattRix 12 minutes ago||
Why not? Not everything has to be SOTA to be interesting.
gallerdude 4 hours ago||
This would have been an amazing release 6 months ago. But the industry moves so fast, this is a trite release. Maybe it’s best for Meta to sell their superintelligence division. I don’t think Zuck’s vision is particularly compelling.
gordonhart 4 hours ago||
A new model comparable (ish) to the Claude/Gemini/GPT flagships is a big deal for the industry and for Meta even if it doesn't set the new frontier.
gallerdude 4 hours ago|||
I’m not sure. If it was open source, certainly. But 4th place doesn’t really matter if you have nothing different to add.
lairv 4 hours ago|||
If the model is truly on par with Opus 4.6/Gemini 3.1/GPT 5.4 (beyond benchmarks) this still puts MSL in the frontier lab category, which is no small feat given that they pretty much rebooted last year

Many labs aren't able to keep up with the frontier, xAI, Mistral

datadrivenangel 4 hours ago|||
Fourth place means you're not reliant on any of the external providers for internal AI use, which is important for organizational health and negotiating with those other providers.
rubyn00bie 3 hours ago||
I’m not sure it’s useful for negotiating, the capex to build it was surely orders of magnitude more than it would cost to just use one of the other frontier models.

It’s like someone negotiating by saying, “I’ll waste even MORE money to build something worse if you don’t give me a deal.”

I’m not discounting there may be other advantages to doing it. I just don’t think negotiating is one.

blahblaher 4 hours ago||||
Why would you use this instead of the other more proven models? Unless it's significantly cheaper. The general population mostly wants it free, and the more professional users are willing to pay for good/better responses.
NitpickLawyer 4 hours ago|||
You wouldn't use this as an API. You would "use" this inside the meta properties. Have a shop on fb marketplace? Now you have copy, images, support, chat, translations, erp, esp, fps and all the other acronyms :) and so on for your mom and pop shop @200$/mo. Probably worse than say claude/gemini but it's right there, one button away. "Click here to upgrade to AI++" or something.
gallerdude 3 hours ago||
But rolling your own can’t be that much cheaper than buying it from a leading lab. Especially when you consider the amount of spending on datacenters.
hnav 3 hours ago||
leading labs are going to be tightening the screws. Otherwise why not just run the entire company on a public cloud?
gordonhart 3 hours ago|||
I won't use it, but I'm excited to see it for the same reason why I'm excited to see a near-frontier open-source release: more competition pushes prices down and reduces monopoly/cartel risk. I won't use Muse or Grok or GLM at this point but they're good for the ecosystem.
zozbot234 4 hours ago|||
Their new Contemplating mode gives this model a Deep Research ability (akin to existing models from GPT and Gemini) that might make it quite comparable to the just-announced Mythos.
solenoid0937 4 hours ago|||
Mythos is a much bigger pre train, Contemplating is not the same thing.
zozbot234 4 hours ago||
> Mythos is a much bigger pre train

Do we have data to substantiate that claim?

solenoid0937 4 hours ago||
It's pretty common knowledge. Spud is the only other PT comparable with Mythos.

Both Spud and Mythos can also scale via inference time compute.

Meta simply did not have enough compute online, long enough ago, to have a similar PT.

temp_praneshp 3 hours ago|||
> might make it quite comparable to the just-announced Mythos

Do we have data to substantiate that claim?

dgellow 4 hours ago|||
I never understood why meta decided to join the race. They don’t sell compute like Google or Microsoft. Why not let others do the hard work and integrate their LLMs in your systems if needed? I assume it’s because they have Instagram, Facebook, WhatsApp, Thread data and feel they should be the ones using them for training, but it’s really not obvious how having a frontier AI lab benefits their business
observationist 4 hours ago|||
Adtech Money. They've got GPUs, they've got the infrastructure, and they've got the advertisement platform, and the point is getting AI that can exploit the adtech and create a flywheel effect, maximizing return from the data they collect from Insta, WhatsApp, Facebook, etc.

It's not just about LLMs, it's about being able to model consumers and markets and psychology and so on. Meta is also big in the manipulation side of things, any sort of cynical technological exploitation of humans you can imagine but that is technically legal, they're doing it for profit.

bachmeier 3 hours ago||||
> I never understood why meta decided to join the race.

I can think of at least two reasons. Price and customizability. If they train their own models on their own data, they potentially have a better model at a better price, and they're not at the mercy of Anthropic's decisions when they decide to raise prices. Additionally, if you use someone else's model, you use it the way they create it and permit you to use it. In a couple years, who has any idea how these models are used. Arguably, a company the size of Meta should be in control of their AI models.

eldenring 4 hours ago||||
Because there's a realistic chance this is the only important software technology moving forward, and commoditizes Metas's entire business which is software.
dgellow 2 hours ago||
Meta’s business is human attention, human connections, and all derived data. They can use AIs for their systems, but the question is why do they feel the need to spend billions on training and running their own frontier model
vinni2 3 hours ago||||
From what I heard Meta is spending hundreds of millions each month in Claude credits for developers. So that’s a huge saving if they have own models that match Opus.
spindump8930 2 hours ago||
Spending tons of money on Claude and the recent token benchmarks came WELL after Meta's huge investments in compute infrastructure for AI as well as the long history of language model development inside science divisions at the company.
xnx 4 hours ago||||
Zuck is trying to convince himself he's good, and not just lucky.
chermi 3 hours ago||||
You basically have to be involved if you're meta. Even if there's only 5% chance this AI stuff is as disruptive as the labs claim it is, you can't afford to miss out. Even if you're lagging frontier, you must develop the competency internally. Otherwise you ignored a 5% chance of total annihilation, probably even exposing you to shareholder lawsuits.
SoftTalker 3 hours ago||||
LLMs/Chat-based systems will reach a point where Facebook, WhatsApp, Threads, Instagram, etc. are all unnecessary. The idea of opening a browser or a specific app to do a thing will seem antiquated. You can do it all with your chat-based agent. Meta wants to be part of that.
operatingthetan 3 hours ago|||
I don't think everyone only wants to talk to machines going forward...?
SoftTalker 3 hours ago||
I don't want to do it now. But that seems to be where we are being headed, like lemmings running for the cliff.
dgellow 3 hours ago|||
Sure but they have the platforms, they don’t need their own frontier models for that
SoftTalker 2 hours ago||
The platforms will be irrelevant at some point. "Posting to Facebook" won't be a thing.
KaiserPro 3 hours ago||||
A few things:

1) meta was doing this at scale before openAI

2) decent ML is critical to catagorising content at scale, the more accurate and fast the category, the finer the recommendations can be (ie instead of woman, outside as a tag for a video, woman, age, hair colour, location, subjects in view, main subject of video, video style) doing that as fast as possible with as little energy as possible is mission critical

3) The llama leak basically evaporated the moat around openAI who _could_ have become a competitor

4) for the AR stuff, all of these models (and visual models) are required to make the platform work. They also need complete ownership so that it can be distilled to make it run on tiny hardware

5) dick swinging

6) they genuinely want to become a industrial behemoth, so robots, hardware, etc are now all in scope.

bee_rider 4 hours ago||||
I think they just want to be a winner in the “next thing.” They hit social networking, but missed mobile operating systems and didn’t compellingly win at social media. Eventually an ambitious person with a bazillion dollars wants a clear win, right?
storus 3 hours ago||||
Only thanks to Meta we have competitive local LLMs. Without LLama nothing decent would have been released. Commoditize your complements in action.
yoz-y 4 hours ago||||
AI NPCs to fill in the empty Metaverse?
gallerdude 4 hours ago||||
I’m sure there’s more to it than this, but it feels like Zuck has pet interests like VR and now AI.
alex1138 4 hours ago||
But no account support, that's boring

Or any quality control (people missing posts)

Or banning the people who should be banned while leaving everyone else alone

This is Zuck: https://news.ycombinator.com/item?id=4151433 or https://news.ycombinator.com/item?id=10791198

aylmao 3 hours ago||||
First and most importantly is the fact they have a lot of very valuable data they wouldn't want to siphon to a competitor. This data is a key strategic asset in the space where they do business.

Secondly though, I think it has to do with the fact Meta is big enough to worry about vertical integration and full control of their business.

The whole reason they've been trying to make AR/VR happen for over a decade now is the assumption of a worst case and best case scenario. The worst case is Apple and Google wants them gone. This isn't as far fetched as it seems, Google has historically been Meta's biggest competitor and even tried to release its own social network back when Meta was threatening them. If either pulls Meta apps from their respective stores, it'd be an immense blow to Meta; their whole trillion-dollar business depends on competitor's platforms.

Meta tried making inroads into the phone business but failed; it is a very crowded market after all. So they changed their strategy. Instead of playing catch-up, they'd invent "the next iPhone" and be the first to a brand new market. This is the best case scenario; they invent a new platform where they can be dominant from day 1 and stop depending on competitor's hardware, not only removing that risk factor for them, but also unlocking a new market they can control.

AI ties into all this because it appears to be key for this next platform to happen. You will communicate with these smart glasses via voice, hand gestures, or subtle movements that a model will have to interpret. The features that could make them stand out as more than just a screen on your face are all AI related; object detection, world understanding, context awareness, etc. If all this were done via a 3rd party Meta would effectively be back on square one: a competitor could easily yank away its model access, or sell it to a competitor. Meta would be again at the mercy of others.

Compared to other big-tech players, I think it's easy to see how Meta is in a riskier position. There's little Google or Microsoft can do to kill the iPhone. There's little Apple or Google can do to kill Amazon's online store. There's little Amazon or Apple can do to kill Microsoft's business deals. Google and Meta are primarily in the business of capturing people's data, attention, and selling ads, and both Google and Apple could do quite some damage to Meta. Beyond expanding it, it's important for them to invest in ways to protect their money-printing machine.

chairmansteve 4 hours ago||||
Pumps up the stock price.
addandsubtract 3 hours ago||||
To download all those torrents, obviously.
swyx 4 hours ago||||
you dont understand why zuck, who paid $1B for instagram when they had no revenue and 7 employees because he is paranoid about platform shifts, decided to join the race for (what is seeming highly possibly) the biggest platform shift in human history?
oceansky 4 hours ago|||
He also tried and failed to buy Snapchat, and then copied their feature on all their big products: Instagram, Facebook and even WhatsApp.
prodigycorp 4 hours ago|||
The way you put it, I understand it less. lol
awestroke 4 hours ago||||
Because Zuck has chronic FOMO, he's said as much himself
zeroonetwothree 4 hours ago|||
But then how will Zuck win the billionaire dick measuring contest?
throwaw12 4 hours ago||
> I don’t think Zuck’s vision is particularly compelling.

But he has to do it anyways, otherwise Meta can be disrupted easily.

Google, Apple has hardware, distribution channels for their products

Amazon has the marketplace and cloud

Microsoft has enterprise and cloud

Meta is always looking for ways to stay afloat

xnx 4 hours ago||
Meta has 3.5 billion daily active users
throwaw12 4 hours ago||
and has competitors like: TikTok, SnapChat, YouTube, Netflix, X, HBO, Amazon Prime, all fighting for the attention time.

They are worried something like Sora can disrupt them quickly

GalaxyNova 2 hours ago||
It is unfortunate that they decided to stop doing open-weight releases.

What could have been interesting has been reduced to simply another subpar LLM release.

spearman 1 hour ago||
Uploading images requires logging in. Logging in is broken. It redirects to https://meta.ai/?error=Token%20exchange%20failed and doesn't show any error message. Impressive.
gritspants 3 hours ago||
I would like someone to tell me how stupid I am. If I were Meta/Zuck I'd open source a great model the moment my company developed it. This just looks like a pitch to investors, otherwise.
jamiequint 3 hours ago|
"This just looks like a pitch to investors"

The goal of public companies is generally to generate profit for their investors.

kzrdude 10 minutes ago|||
pitch to investors sounds like working for the opposite goal though - to convince investors to give more money to the company.
samrus 3 hours ago||||
Im beginning to think thats the mantra we'll keep reciting as this whole country slowly falls apart
gritspants 2 hours ago||||
Thank you for telling me how stupid I am.
SoftTalker 3 hours ago|||
This is also the goal of private companies.
edwcross 3 hours ago|
What is the "BioTIER-refuse" thing mentioned in the "Bioweapons Refusal" graph?

I Googled it and found absolutely nothing.

Well, to be honest, I got 100% of websites containing the French word "boîtier" (box) with a typo.

Even on Google Scholar, the closest match is "BioTiER (Biological Training in Education and Research) Scholars Program", which is at least 10 years old and has nothing to do with that.

Is that an AI-generated image with an AI-generated name that has no physical existence?

EnderWT 3 hours ago|
https://securebio.org/biotier/
More comments...