Posted by meander_water 11/2/2025
I tied it together with qwen3 30b thinking. Very easy to get it up and running, but lots of the numbers are shockingly low. You need to boost iterations and context. Especially easy if you already run searxng locally.
I havent finished tuning the actual settings, but for the detailed report it'll take ~20 minutes and so far has given pretty good results. Similar to openai's deep research. Mine often has ~100 sources.
But something I have noticed. It didnt seem to me the model was important. The magic was moreso in the project. Getting deep with higher iterations and more results.
n-cpu-moe in https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b
https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b:fr...
You are better off asking it a write a script to invoke itself N times across the task list.
LLMs are really bad at being comprehensive, in general, and from one inference to the next their comprehensive-ness varies wildly. Because LLMs are surprising the hell out of everyone with their abilities, less attention is paid to this; they can do a thing well, and for now that’s good enough. As we scale usage, I expect this gap will become more obvious and problematic (unless solved in the model, like everything else).
A solution I’ve been toying with is something like a reasoning step, which could probably be done with mostly classical NLP, that identifies constraints up front and guides the inference to meet them. Like a structured output but at a session level.
I am currently doing what you suggest though, I have the agent create a script which invokes … itself … until the constraints are met, but that obviously requires that I am engaged there; I think it could be done autonomously, with at least much better consistency (at the end of the day even that guiding hand is inference based and therefore subject to the same challenges).
For most plans, Deep Research is capped at around 20 sources, making it for many cases the least useful research agent, in particular worse than a thinking mode Gpt5 query