Muse Spark: Scaling towards personal superintelligence

Posted by chabons 3 hours ago

Muse Spark: Scaling towards personal superintelligence(ai.meta.com)

156 points | 206 comments

tty456 2 hours ago|

I don't get the comments trashing this. If it slightly beats or even matches Opus 4.6, it means Meta is capable of building a model competitive with the leading AI company. Sure, they spent a lot of money and will have on-going costs. But how much more work would it take to turn that into a coding agent people are willing to try (and pay for) along side their usage of a collection of agents (Claude, Codex, etc)? Also means Meta doesn't have to pay another company to use a SATA model across all their products (including IG and WhatsApp, vr) which will matter to their balance sheet long term (despite the constant r&d spend).

prodigycorp 2 hours ago||

Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

zozbot234 2 hours ago||

The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.

dilap 1 hour ago|||

Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.

prodigycorp 2 hours ago|||

the models were objectively horrible

NitpickLawyer 2 hours ago||

They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.

refulgentis 1 hour ago|||

Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.

prodigycorp 2 hours ago|||

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

refulgentis 1 hour ago||

I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.

modeless 25 minutes ago|||

It's a decent model if the benchmarks are to be believed, but it won't be close to Opus in usefulness for programming. None of these benchmarks completely capture what makes a model useful for day-to-day coding tasks, unfortunately. It will take time for them to catch up, and Opus will keep improving in the meantime. But it's good to have more competition.

redox99 2 hours ago|||

> If it slightly beats or even matches Opus 4.6

It doesn't though

ryeguy_24 2 hours ago||

Curious on why you think this. Any data points that led you to this?

howdareme 2 hours ago||

The benchmarks they released

johnfn 10 minutes ago||

What do you mean? In most cases, the benchmarks show a larger number for Muse and a smaller number for Opus.

ChipopLeMoral 2 hours ago||

> I don't get the comments trashing this.

People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.

glerk 1 hour ago||

Personal as in Meta gets your personal data so they can sell you more ads.

2pointsomone 1 hour ago|

[flagged]

hackrmn 2 hours ago||

The hero image on the linked page, which consists of a muted teal background with the words "Introducing Muse Spark", weighs in at 3,5MB. I don't even...

KerrickStaley 1 hour ago||

"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

- Hacker News Guidelines https://news.ycombinator.com/newsguidelines.html

gobdovan 13 minutes ago|||

It's at least Meta-relevant. Compression Represents Intelligence Linearly (Y Huang, 2024)

yawnxyz 51 minutes ago|||

I think this speaks to the product release iself

fleabitdev 32 minutes ago|||

Good catch - looks like it's a PNG image, with an alpha channel for the rounded corners, and a subtle gradient in the background. The gradient is rendered with dithering, to prevent colour banding. The dither pattern is random, which introduces lots of noise. Since noise can't be losslessly compressed, the PNG is an enormous 6.2 bits per pixel.

While working on a web-based graphics editor, I've noticed that users upload a lot of PNG assets with this problem. I've never tracked down the cause... is there a popular raster image editor which recently switched to dithered rendering of gradients?

Overpower0416 1 hour ago|||

lol it literally took me 2s to google search "optimize image for website" and 10s to upload and get a smaller sized image.

The result for that specific image is: 500kb. 85% decrease in size

BugsJustFindMe 1 hour ago|||

An indistinguishable JPG is 170KB. An SVG would be 20KB.

levocardia 1 hour ago||

CSS with a linear gradient background would be even smaller :)

sofixa 1 hour ago|||

You can even automatically do that on your CDN/delivery/web server layer. Or as part of your web deployment pipeline.

Overpower0416 1 hour ago||

Yes, but it might be a little too advance for Meta ;)

re-thc 59 minutes ago||

But they have personal superintelligence?

hungryhobbit 2 hours ago|||

Someday our robot overlords will be intelligent enough to ... optimize images!

(But today is not that day.)

zfol_510 1 hour ago|||

And it doesn't even look high-res.

Invictus0 2 hours ago||

complaining about sand on the beach

fooqux 1 hour ago|||

It's not sand on the beach, it's garbage on the beach.

hackrmn 2 hours ago|||

I am simply offended. By Meta's lack of sensibilities (or ability) towards use of images on the Web while touting their new flavour of artificial intelligence as a product.

Invictus0 1 hour ago||

old man shouts at cloud

hackrmn 1 hour ago||

more like old man shouts at someone else's computer

creddit 2 hours ago||

Ran some of my internal benchmarks against this and I'm very unimpressed. I don't think this moves them into the OAI v Anthropic v Gemini conversation at all.

Major analytical errors in their response to multiple of my technical questions.

creddit 2 hours ago|

Playing with this some more and it's actively not good. Just basic mathematical errors riddling responses. Did some basic adversarial testing where its responses are analyzed by Gemini and Gemini is finding basic math errors across every relatively (relative to Opus, Gemini or GPT can handle) simple ask I make. Yikes.

daft_pink 2 hours ago||

This really reinforces the idea that the AI race and the Railroad Mania of the 19th century are very similar.

So many different companies are going to have similarly powerful ai that there will be no moat around it and it will be cheap. They will never earn their investment back.

cheriot 1 hour ago||

I suspect this is the real reason behind Anthropic limiting subscriptions to their own products and keeping API prices several times higher than comparable models. Applications more sticky than API users and less technical users more sticky than programmers (ie Cowork more sticky than Code).

netcan 38 minutes ago||

Anthropic generally seem more into living within market discipline and market signals of some sort. Products with margins, even if it's sort of irrelevant considering R&D costs and capital inflow.

That said, there's nothing like the real thing.

The risk is something like the railroad bubble and the dotcom. Over-investement, circular revenue and a timeline that doesn't work.

Or, maybe it'll work out.

dist-epoch 2 hours ago||

The moat is in the compute and the energy access.

And further down the line in chips, which is why Elon is building a fab now.

There are plenty of capable models on HuggingFace, yet I have no way of running them.

khalic 2 hours ago|||

Give it a few years, or month. Tiny models are getting outrageously good

spprashant 1 hour ago|||

I wonder if this is why the tech cartel is buying up all the hardware?

If the average user gets convinced they could run LLMs for cheap at home, you cannot trap users in your walled garden anymore.

mobattah 2 hours ago|||

Exactly. We’ll see the cost of AI continue to drop.

I was saying this for years about Tesla’s FSD - they finally had to give in and drop the price to stay competitive.

cedws 1 hour ago||||

That fab will never be delivered. In five years you might see the manufacturing equivalent of a person dancing in spandex.

nutjob2 1 hour ago|||

> which is why Elon is building a fab now

At least he says he's doing that. It doesn't really make sense since you're not going to achieve an advanced node from a standing start in a practical time frame and cost.

Sounds like more Musk flavored vapor.

re-thc 57 minutes ago||

> It doesn't really make sense since you're not going to achieve an advanced node from a standing start in a practical time frame and cost.

They already announced a partnership with Intel.

nutjob2 41 minutes ago||

Oh the irony.

yalogin 1 hour ago||

Meta is in a weird spot. They caught up late to the game and instead of releasing llama as a chat bot they open sourced it, precisely because they lost the mind share. They thought chatbot is not their product and I am sure they are regretting it now. Mark is obsessed with becoming the android of something and he poured billions into the metaverse thinking he is first and failed. He then open sourced llama and wanted to be the android of llms. He ended up enabling groq but it didn’t benefit meta directly at all. They have no revenue or mind share path from llms but continue to pour billions into it. The only 1-1 mapping is with the glasses but that is a tough fit for the company given they are extremely allergic to privqcy and security.

Not sure what this is now.

gardnr 12 minutes ago||

The llama weights were leaked. It open sourced itself.

You are right though. Meta could have been in lockstep releasing ChatGPT features into some chat bot on Facebook.com but instead it seemed like their FAIR arm was hell bent on commoditising this stuff by publishing their research models before the Chinese companies took the lead in that.

It’s hard for me to be mad at FAIR even though I general disagree with the outcomes that Meta produce for their users.

zozbot234 1 hour ago||

Llama is available as a chatbot in WhatsApp.

throwaw12 3 hours ago||

How is that Meta spent so much money for talent and hardware, but the model barely matches Opus 4.6?

Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has

strulovich 2 hours ago||

Meta did a bunch of mistakes, and look like Zuckerberg spent a lot of money on talent and made big swings to change it (that happened about a year ago)

I think it’s unrealistic to expect them to come back from that pit to the top in one year, but I wouldn’t rule them out getting there with more time. That’s a possible future. They have the money and Zuckerberg’s drive at the helm. It can go a long way.

coffeebeqn 2 hours ago|||

Matching Opus 4.6 would be pretty good? It’s the SOTA actually available model

reissbaker 2 hours ago||

Muse Spark doesn't even match GLM-5.1 on most benchmarks. And GLM is open source!

solenoid0937 2 hours ago|||

It's benchmaxxed.

If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.)

throwaw12 2 hours ago||

how do you know it's benchmaxxed?

solenoid0937 2 hours ago|||

Friends at Meta with access to the model + personal experience at Meta.

Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.

luma 2 hours ago||||

For one, they aren't using the latest version of many of the benchmarks. eg, ARC-AGI 2 and not 3, etc.

prodigycorp 2 hours ago|||

meta's benchmaxing tendencies are well known. llama4 was mega benchmaxxed, there's nothing that suggests to me that meta's culture has changed.

spindump8930 1 hour ago||

Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.

impulser_ 2 hours ago|||

It's not even on par with Sonnet. It's on par with open source models and it not even open source and sit behind a private preview API.

Might as well not release anything.

username223 2 hours ago|||

Facebook is working with the talent that can’t find a job at some other company. It doesn’t surprise me they ship mediocrity.

wotsdat 3 hours ago|||

[dead]

zozbot234 2 hours ago||

> has some secret sauce

Yup, it's called test-time compute. Mythos is described as plenty slower than Opus, enough to seriously annoy users trying to use it for quick-feedback-loop agentic work. It is most properly compared with GPT Pro, Gemini DeepThink or this latest model's "Contemplating" mode. Otherwise you're just not comparing like for like.

throwaw12 2 hours ago||

> it's called test-time compute.

Why can't others easily replicate it?

coder68 2 hours ago||

I have not delved into the theory yet but it seems that the smaller open-source models do this already to an extent. They have less parameters, but spend much more time/tokens reasoning, as a way to close the performance gap. If you look at "tokens per problem" on https://swe-rebench.com/ it seems to be the case at least.

bguberfain 2 hours ago||

We all know it... but I think they were very bold in this warning about using your private messages to train public models. _Your messages with AIs will be used to improve AI at Meta. Don't share information, including sensitive topics, about others or yourself that you don't want the AI to retain and use_

discopicante 2 hours ago|

meta doesn't exactly instill confidence on using personal data responsibly. hard pass

moab 3 hours ago||

"Muse Spark is available now, and Contemplating mode will be rolling out gradually in meta.ai."

How does one get their hands on these models? They are not open-source, right? I go to meta.ai, but it's just a chat interface---no equivalent to codex or claud code? Can you use this through OpenCode? Is meta charging for model access, or is the gathering of chat data a sufficiently large tithe?

meetpateltech 3 hours ago||

"It will be available in private preview via API to select partners, and we hope to open-source future versions of the model."

from Facebook Newsroom: https://about.fb.com/news/2026/04/introducing-muse-spark-met...

tempaccount420 2 hours ago||

I can't think of any "select partners" that would want to use this non-SOTA model. Just put it on OpenRouter.

giancarlostoro 2 hours ago||

If Microsoft is a select partner, maybe they could shove it into Copilot for VS or something, but yeah, I'm wondering the same, maybe Apple could be one of their partners too?

monkeydust 3 hours ago|||

TBD it seems. So far the only explained usage pattern is through a Meta product (Whatsapp, Facebook, Instagram).

moab 3 hours ago||

So to verify their claims and see how strong these models are, the answer is "believe us"?

Note: I'm expressing some skepticism here largely due to how recent rollouts from Meta flopped. Sincerely hoping that they do better this time around!

nemomarx 3 hours ago||

I assume the answer is try it out in the chat mode? You could run your usual benches through that right

pstuart 3 hours ago||

I appreciate that they build this stuff for their own benefit, but I don't want to feed even more of my private info. Hopefully the models will become public or lead to equivalent models from other sources.

hvass 1 hour ago|

Genuine question: Why release this the day after Mythos? It does not appear SOTA (just based on benchmarks). OpenAI will likely release Spud tomorrow.

eranation 1 hour ago|

That's a really good question, my sarcastic mind thinks that Anthropic rushed the Mythos announcement of fears of Meta stealing their thunder... (I guess someone leaked that, a LOT of anthropic folks are ex meta... so, you know)

Just a speculation, I have no real knowledge about it.

More comments...