Mistral AI Releases Forge

Posted by pember 1 day ago

689 points | 174 commentspage 2

losvedir 10 hours ago|

> Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.

I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.

mikodin 7 hours ago|

I was under this impression as well - I'd love to hear from someone who's deeper in the know about this!

todteera 12 hours ago||

Interesting how Mistral is investing into training models for industry specific use cases. With the commoditization of intelligence by base models, they're probably looking to creating value from specialized verticals.

jbverschoor 14 hours ago||

ASML and ESA as clients means something. I dont expect to see the first name somewhere else on the logo list

alansaber 9 hours ago||

I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.

andai 20 hours ago||

They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

simsla 19 hours ago|

Typical stages of training for these models are:

Foundational:

- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)

And sometimes...

- Some more customer-specific fine-tuning.

Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.

zby 15 hours ago||

My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.

mhl47 14 hours ago||

"External Storage" whatever that is can not be the same as continous learning as it does not have the strong connections/capture the interdepencies of knowledge.

That said I think we will see more efforts also on the business side to have models that can help you build a knowledge base in some kind of standardized way that the model is trained to read. Or synthesize some sort on instructions how to navigate your knowledge base.

Currently e.g. Copilot tries to navigate a hot mess of a MS knowledge graph that is very different for each company. And due to its amnesia it has to repeat the discovery in every session. No wonder that does not work. We have to either standardize or store somewhere (model, instructions) how to find information efficiently.

zby 13 hours ago||

The key to make Copilot useful is to take the limited context problem seriously enough. There are many dimensions to it: https://zby.github.io/commonplace/notes/context-efficiency-i... and it should be the starting point for designing the systems that extensively use llms.

Centigonal 15 hours ago||

What do you mean when you say "external storage?"

zby 13 hours ago|||

A knowledge base - something where the LLM knows how to find the knowledge it needs for a given task. I am working on this idea in https://zby.github.io/commonplace/

ithkuil 13 hours ago|||

A form of context engineering

hermit_dev 19 hours ago||

The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.

reverius42 16 hours ago||

Ironically that was also the past of AI. In 2016 it was all about specialized models (not just training data, everything including architecture and model class/type) for specific tasks and that's the way things had been for a long time.

Are you suggesting that it's an aberration that from ~2019 to ~2026 the AI field has been working on general intelligence (I assume this is what you mean by "achieving benevolent knowledge")?

Personally I think it's remarkable how much a simple transformer model can do when scaled up in size. LLMs are an incredible feat of generalization. I don't see why the trajectory should change back towards specialization now.

holoduke 16 hours ago||

I don't think that's true. Nothing points to specialized LLMs being better. General purpose LLMs are just much more useful in daily work.

hermit_dev 9 hours ago||

To be more specific, I think the future is local and specialized. IBM among others thought the same way with their giant mainframe centralized computers and the original way people would utilize software in the 70s. It's an interesting parallel to today's cloud if you think about it. It's just not scalable from a resource (hardware), energy, and cost perspective. I think we're living a unique time, but it's going to change. Without continued massive funding and a pivot to sustainable, things will (and should) change.

Don't get me wrong, general intelligence will always be important and should be a part of specialist models to a degree for understanding, but it doesn't make sense to use an 800B+ parameter model to help write an email or do research on company trends. Hell, look at what China has been able to do. Qwen 3.5 9B, exceeds Claude 3.5 Haiku and nears Sonnet 3.5 levels. The 27B variation of Qwen 3.5 is superior to both in many ways and even rivals newer models. There is obviously an inherit lag behind, but we will gradually see a shift as these models become more capable.

Right now we are chasing 1-2% improvements at the cost of billions. Local are already absurdly capable (more and more by the day - same with cloud ofcourse) and smarter than most people in specific areas. To do most jobs, can we honestly say it requires a PhD or higher level understanding to perform? We're chasing something that is becoming more and more not needed from a general day to day perspective. AGI is outstanding, but not practical (at least today). I think we'll get there anyway at our current trajectory (though dangerous), but I suspect things will shift.

rorylawless 20 hours ago||

The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

aavci 19 hours ago|

Interesting to see. I thought they were promoting fine tuning

tho23i42342397 11 hours ago||

Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.

I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.

Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).

Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.

bob001 7 hours ago|

My sense is that it sounds amazing in theory to executives who have never had to themselves look at internal data. In reality the internal knowledge base is a mix of incomplete, inaccurate, self serving lies, out of date and so on. At worst, the data is explicitly biased to hide reality from executives so the AI will look extra good to executives. Of course, a business that makes all tactical decisions based on lies is not going to do well.

thecopy 12 hours ago|

Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"

Dissapointing.

More comments...