Top
Best
New

Posted by EvanZhouDev 23 hours ago

MAI-Code-1-Flash(microsoft.ai)
https://microsoft.ai/models/mai-code-1-flash/

https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF

Launching seven new MAI models: https://microsoft.ai/news/building-a-hillclimbing-machine-la...

517 points | 243 commentspage 2
ChicagoDave 3 hours ago|
I’m not sure the message should be benchmarking.

The eye-opener is clean licensed data with filters for AI content (not sure how you do that).

If MSFT builds up using an ethical approach, there is a large anti-AI audience that might take note.

efields 22 hours ago||
Please test your websites in Safari. Almost all of your iOS users use it by default, and the desktop experience is pretty close to the mobile experience, so testing is easy.

That scroll effect is jank city for me (yeah yeah works fine in Chrome/Edge).

whalesalad 21 hours ago|
some kind of scroll hijack going on for sure, feels terrible on firefox+macos
HDBaseT 19 hours ago||
I instantly close websites which use this weird scroll hijacking and slow animation nonsense.

Let me slide as fast and unrestricted as I want. I do not want to "transition" to the next paragraph.

This trend needs to stop.

OsrsNeedsf2P 23 hours ago||
So it's trained on the SWE Bench Pro evalset
topsycatt 21 hours ago||
That's not accurate. Take a look at the paper to see what it is trained on! And specifically decontamination is called out in A.4

https://microsoft.ai/wp-content/uploads/2026/06/main_2026060...

lemonish97 23 hours ago||
What is your evidence for this claim?
fooker 23 hours ago||
They say hill climbing

https://microsoft.ai/news/building-a-hillclimbing-machine-la...

Unless they specifically clarify that the testing and training benchmarks are completely separate, we have to assume they test on the same 'hill' the model climbs.

artemisart 21 hours ago|||
Hill climbing doesn't mean much but absolutely doesn't imply they cheat on benchmarks. They have more details here https://microsoft.ai/news/introducing-mai-thinking-1/ it seems to be "RL on everything".
jongalloway2 22 hours ago||||
[dead]
ajyoon 23 hours ago|||
[flagged]
tosh 23 hours ago||
not open weight or at least I did not find anything indicating open weight
adrian_b 10 hours ago||
Tomorrow NVIDIA will publish Nemotron 3 Ultra, which will be the biggest open weights LLM from a US company (550B parameters).

The early testers have confirmed that it is much better than all earlier US open weights models, but it is not as good as the best Chinese open weights models.

While Nemotron 3 Ultra is not the smartest open weights LLM, it is well optimized for fast inference, so it is much faster than the other LLMs of the same size.

In any case I believe that it is very good to have an additional option in big open weights LLMs, because until now all existing models have shown that even if some model is definitely better on average than another, the weaker model can still be better in some particular applications.

With open weights models, you can afford to try multiple LLMs for the more important tasks and then choose the best solution.

HarHarVeryFunny 4 hours ago|||
NVIDIA seem to be following a smart Intel-like strategy of selling chips and also creating software that helps create demand for those chips. With Intel it was things like MKL, IPP, OpenCV etc, and with NVIDIA it is not just CUDA and development libraries but also models like Nemotron.

The pure-AI companies like OpenAI and Anthropic are hoping to sell you API access to cloud-based AI, perhaps running on NVIDIA chips, but it seems NVIDIA's plan may be for you to run local AI, maybe from NVIDIA, running on local NVIDIA chips.

anthonypasq 2 hours ago|||
> it is well optimized for fast inference

do you have any insight into the actual technical details that make this sort of things possible? I want to learn more about model architectures. Does it have to do with attention mechanisms or sparsity or something?

ggcr 21 hours ago||
:(

I was hoping Microsoft would make it open weights, as they have done for years with the Phi models.

The era of big tech releasing models into the wild might be over, which IMO is counter-productive, as we are shifting from "the model is the product" to "the harness is the product"

deckar01 22 hours ago||
If only they had launched that yesterday I might have avoided Copilot auto model selection using a 9x model, quietly burning my monthly quota in a single afternoon.
tgtweak 3 hours ago||
Is anyone using haiku 4.5?

Why not showcase it against something in a similar domain like qwen3.6 or gemma 4?

mentos 22 hours ago||
Shouldn’t the next model focus not be on code but system design?

Seems like the work from a good system design to code is practically solved.

Now it’s a matter of the design of the system. Or is that represented in these evals?

dist-epoch 22 hours ago|
Have you tried system design with LLMs? I find them pretty good at suggesting 5 architectures for a problem and then iterating on the solutions.

Even if I had no idea, going with the default suggestion would not be a terrible mistake, assuming you did describe your requirements relatively well.

AJRF 22 hours ago||
Copilot brand is tarnished, so time to bung everything under MAI?
layer8 20 hours ago|
Maybe the next Windows update will change This PC back to MAI Computer. ;)
jnwatson 17 hours ago||
I had to remind myself what Haiku is even for. Anthropic hasn't spent a lot of recent marketing on it.

When I need a light model, I reach for Sonnet. It is nearly free on the max plans, and quite fast. I don't see a place for Haiku in regular coding.

Haiku I guess is when you need summarization/categorization at scale.

Microsoft setting Haiku as the benchmark is a low bar.

lemonish97 17 hours ago|
> "It is nearly free on the max plans"

is a funny oxymoron

ajyoon 23 hours ago|
Scroll wheel hijacked on this entire domain
grav 22 hours ago||
Fix:

  (() => {
  const KILL = ['wheel', 'mousewheel', 'DOMMouseScroll', 'touchmove'];
  const block = e => e.stopImmediatePropagation();
  for (const t of KILL) {
    window.addEventListener(t, block, { capture: true, passive: true });
    document.addEventListener(t, block, { capture: true, passive: true });
  }
  document.documentElement.classList.remove('lenis','lenis-smooth','lenis-scrolling','lenis-stopped');
  console.log('Scroll hijack disabled — native scrolling restored.');
  })();
sethops1 21 hours ago||
The fix is to close the tab.
matchbok3 23 hours ago|||
Yeah this website is horrendous to use. What were they thinking?
BadBadJellyBean 22 hours ago||
You mean "what was the LLM thinking?"
infraredshift 22 hours ago||
[dead]
More comments...