Posted by meander_water 1 day ago
n-cpu-moe in https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
You are better off asking it a write a script to invoke itself N times across the task list.
LLMs are really bad at being comprehensive, in general, and from one inference to the next their comprehensive-ness varies wildly. Because LLMs are surprising the hell out of everyone with their abilities, less attention is paid to this; they can do a thing well, and for now that’s good enough. As we scale usage, I expect this gap will become more obvious and problematic (unless solved in the model, like everything else).
A solution I’ve been toying with is something like a reasoning step, which could probably be done with mostly classical NLP, that identifies constraints up front and guides the inference to meet them. Like a structured output but at a session level.
I am currently doing what you suggest though, I have the agent create a script which invokes … itself … until the constraints are met, but that obviously requires that I am engaged there; I think it could be done autonomously, with at least much better consistency (at the end of the day even that guiding hand is inference based and therefore subject to the same challenges).
For most plans, Deep Research is capped at around 20 sources, making it for many cases the least useful research agent, in particular worse than a thinking mode Gpt5 query
I switch between gemini and ChatGpt whenever I feel one fails to fully grasp what I want, I do coding in claude.
How are they supposed to become the 1 trillion dollar company they want to be, with strong competition and open source disruptions every few months?
Arguably LLMs are both (1) far easier to switch between models than it is today to switch from AWS / GCP / Azure systems, and (2) will be rapidly decreasing switching costs for your legacy systems to port to new ones - ie Oracle's, etc. whole business model.
Meanwhile, the whole world is building more chip fabs, data centers, AI software/hardware architectures, etc.
Feels more like we're headed to commodification of the compute layer more than a few giant AI monopolies.
And if true, that's actually even more exciting for our industry and "letting 100 flowers bloom".
The underlying architecture isnt special, the underlying skills and tools aren't special.
There is nothing openAI brings to the table other than a willingness to lie, cheat, and steal. That only gives you an edge for so long.
I tied it together with qwen3 30b thinking. Very easy to get it up and running, but lots of the numbers are shockingly low. You need to boost iterations and context. Especially easy if you already run searxng locally.
I havent finished tuning the actual settings, but for the detailed report it'll take ~20 minutes and so far has given pretty good results. Similar to openai's deep research. Mine often has ~100 sources.
But something I have noticed. It didnt seem to me the model was important. The magic was moreso in the project. Getting deep with higher iterations and more results.