Top
Best
New

Posted by meander_water 11/2/2025

Tongyi DeepResearch – open-source 30B MoE Model that rivals OpenAI DeepResearch(tongyi-agent.github.io)
365 points | 153 commentspage 2
incomingpain 11/3/2025|
I love this one: https://github.com/LearningCircuit/local-deep-research

I tied it together with qwen3 30b thinking. Very easy to get it up and running, but lots of the numbers are shockingly low. You need to boost iterations and context. Especially easy if you already run searxng locally.

I havent finished tuning the actual settings, but for the detailed report it'll take ~20 minutes and so far has given pretty good results. Similar to openai's deep research. Mine often has ~100 sources.

But something I have noticed. It didnt seem to me the model was important. The magic was moreso in the project. Getting deep with higher iterations and more results.

brutus1213 11/2/2025||
I recently got a 5090 with 64 GB of RAM (intel cpu). Was just looking for a strong model I can host locally. If I had performance of GPT4-o, I'd be content. Are there any suggestions or cases where people got disappointed?
bogtog 11/2/2025||
GPT-OSS-20B at 4- or 8-bits is probably your best bet? Qwen3-30b-a3b probably the next best option. Maybe there exists some 1.7 or 2 bit version of GPT-OSS-120B
p1esk 11/2/2025||
5090 has 32GB of RAM. Not sure if that’s enough to fit this model.
IceWreck 11/2/2025|||
LlamaCPP supports offloading some experts in a MoE model to CPU. The results are very good and even weaker GPUs can run larger models at reasonable speeds.

n-cpu-moe in https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...

svnt 11/2/2025|||
It should fit enough of the layers to make it reasonably performant.
whatpeoplewant 11/3/2025||
Great to see an open 30B MoE aimed at “deep research.” These shine when used in a multi-agent setup: run parallel agentic AI workers (light models for browsing/extraction) and reserve the 30B agentic LLM for planning, tool routing, and verification—keeping latency/cost in check while boosting reliability. MoE specialization fits distributed agentic AI well, but you’ll want orchestration for retries/consensus and task-specific evals on multi-hop web research to guard against brittle routing and hallucinations.
andai 11/3/2025||
Tongyi provides this model on OpenRouter, including a free version.

https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b

https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b:fr...

VladVladikoff 11/3/2025||
Recently I gave a list of 300 links to deep research and asked it to go through each one to analyze a certain question about them. Repeatedly it would take shortcuts and not actually do the full list. Is this caused by a context window limits? Or maybe Open AI limits request size? Is it possible to not run into these types of limits with locally hosted models?
oofbey 11/3/2025|
I’ve also had extremely poor luck getting any LLM agent to go through a long list of repetitive tasks. Don’t know why. I’d guess it’s because they’re trained for transactional responses, and thus are horrible at repute anything.
ukuina 11/3/2025||
Very much this.

You are better off asking it a write a script to invoke itself N times across the task list.

threecheese 11/4/2025||
Same. I think there’s an untapped market (feature really) here, which if isn’t solved by GPT-next will start to reveal itself as a problem more and more.

LLMs are really bad at being comprehensive, in general, and from one inference to the next their comprehensive-ness varies wildly. Because LLMs are surprising the hell out of everyone with their abilities, less attention is paid to this; they can do a thing well, and for now that’s good enough. As we scale usage, I expect this gap will become more obvious and problematic (unless solved in the model, like everything else).

A solution I’ve been toying with is something like a reasoning step, which could probably be done with mostly classical NLP, that identifies constraints up front and guides the inference to meet them. Like a structured output but at a session level.

I am currently doing what you suggest though, I have the agent create a script which invokes … itself … until the constraints are met, but that obviously requires that I am engaged there; I think it could be done autonomously, with at least much better consistency (at the end of the day even that guiding hand is inference based and therefore subject to the same challenges).

zwaps 11/3/2025||
The OpenAI numbers are a red herring anyway.

For most plans, Deep Research is capped at around 20 sources, making it for many cases the least useful research agent, in particular worse than a thinking mode Gpt5 query

blueboo 11/3/2025||
When was the last time you did a deep research? Good agents just do research as necessary. I find GPT5 Pro >> all the top DR agents
ugh123 11/3/2025||
Slightly off topic but why does word wrapping seem to be broken in this site? Chrome on Android
rippeltippel 11/3/2025|
Thank you for pointing that out, I was about to ask the same. It's giving my OCD a hard time reading it.
krystofee 11/2/2025||
Isnt it huge deal, that this 30B model can compare and surpass huge closed models?
whiplash451 11/2/2025|
Has anyone tried running this on a 5090 or 6000 pro? What throughput do you see?
More comments...