Posted by EvanZhouDev 23 hours ago
https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF
Launching seven new MAI models: https://microsoft.ai/news/building-a-hillclimbing-machine-la...
The eye-opener is clean licensed data with filters for AI content (not sure how you do that).
If MSFT builds up using an ethical approach, there is a large anti-AI audience that might take note.
That scroll effect is jank city for me (yeah yeah works fine in Chrome/Edge).
Let me slide as fast and unrestricted as I want. I do not want to "transition" to the next paragraph.
This trend needs to stop.
https://microsoft.ai/wp-content/uploads/2026/06/main_2026060...
https://microsoft.ai/news/building-a-hillclimbing-machine-la...
Unless they specifically clarify that the testing and training benchmarks are completely separate, we have to assume they test on the same 'hill' the model climbs.
The early testers have confirmed that it is much better than all earlier US open weights models, but it is not as good as the best Chinese open weights models.
While Nemotron 3 Ultra is not the smartest open weights LLM, it is well optimized for fast inference, so it is much faster than the other LLMs of the same size.
In any case I believe that it is very good to have an additional option in big open weights LLMs, because until now all existing models have shown that even if some model is definitely better on average than another, the weaker model can still be better in some particular applications.
With open weights models, you can afford to try multiple LLMs for the more important tasks and then choose the best solution.
The pure-AI companies like OpenAI and Anthropic are hoping to sell you API access to cloud-based AI, perhaps running on NVIDIA chips, but it seems NVIDIA's plan may be for you to run local AI, maybe from NVIDIA, running on local NVIDIA chips.
do you have any insight into the actual technical details that make this sort of things possible? I want to learn more about model architectures. Does it have to do with attention mechanisms or sparsity or something?
I was hoping Microsoft would make it open weights, as they have done for years with the Phi models.
The era of big tech releasing models into the wild might be over, which IMO is counter-productive, as we are shifting from "the model is the product" to "the harness is the product"
Why not showcase it against something in a similar domain like qwen3.6 or gemma 4?
Seems like the work from a good system design to code is practically solved.
Now it’s a matter of the design of the system. Or is that represented in these evals?
Even if I had no idea, going with the default suggestion would not be a terrible mistake, assuming you did describe your requirements relatively well.
When I need a light model, I reach for Sonnet. It is nearly free on the max plans, and quite fast. I don't see a place for Haiku in regular coding.
Haiku I guess is when you need summarization/categorization at scale.
Microsoft setting Haiku as the benchmark is a low bar.
is a funny oxymoron
(() => {
const KILL = ['wheel', 'mousewheel', 'DOMMouseScroll', 'touchmove'];
const block = e => e.stopImmediatePropagation();
for (const t of KILL) {
window.addEventListener(t, block, { capture: true, passive: true });
document.addEventListener(t, block, { capture: true, passive: true });
}
document.documentElement.classList.remove('lenis','lenis-smooth','lenis-scrolling','lenis-stopped');
console.log('Scroll hijack disabled — native scrolling restored.');
})();