I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
It's feasible for small models but, I thought small models were not reliable for factual information?
Foundational:
- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)
And sometimes...
- Some more customer-specific fine-tuning.
Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.
That said I think we will see more efforts also on the business side to have models that can help you build a knowledge base in some kind of standardized way that the model is trained to read. Or synthesize some sort on instructions how to navigate your knowledge base.
Currently e.g. Copilot tries to navigate a hot mess of a MS knowledge graph that is very different for each company. And due to its amnesia it has to repeat the discovery in every session. No wonder that does not work. We have to either standardize or store somewhere (model, instructions) how to find information efficiently.
Are you suggesting that it's an aberration that from ~2019 to ~2026 the AI field has been working on general intelligence (I assume this is what you mean by "achieving benevolent knowledge")?
Personally I think it's remarkable how much a simple transformer model can do when scaled up in size. LLMs are an incredible feat of generalization. I don't see why the trajectory should change back towards specialization now.
Don't get me wrong, general intelligence will always be important and should be a part of specialist models to a degree for understanding, but it doesn't make sense to use an 800B+ parameter model to help write an email or do research on company trends. Hell, look at what China has been able to do. Qwen 3.5 9B, exceeds Claude 3.5 Haiku and nears Sonnet 3.5 levels. The 27B variation of Qwen 3.5 is superior to both in many ways and even rivals newer models. There is obviously an inherit lag behind, but we will gradually see a shift as these models become more capable.
Right now we are chasing 1-2% improvements at the cost of billions. Local are already absurdly capable (more and more by the day - same with cloud ofcourse) and smarter than most people in specific areas. To do most jobs, can we honestly say it requires a PhD or higher level understanding to perform? We're chasing something that is becoming more and more not needed from a general day to day perspective. AGI is outstanding, but not practical (at least today). I think we'll get there anyway at our current trajectory (though dangerous), but I suspect things will shift.
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
Dissapointing.