Posted by Palmik 4 days ago
What's exciting is that there's still so much room for improvement. We benchmark around 5K total tokens/s with the sharegpt dataset and 12K total token/s with random 2000/100, using vLLM and under high concurrency.
DeepSeek-V3/R1 Inference System Overview [2] quotes "Each H800 node delivers an average throughput of 73.7k tokens/s input (including cache hits) during prefilling or 14.8k tokens/s output during decoding."
Yes, DeepSeek deploys a different inference architecture. But this goes onto show just how much room there is for improvement. Looking forward to more open source!
[1] https://developers.redhat.com/articles/2025/03/19/how-we-opt...
[2] https://github.com/deepseek-ai/open-infra-index/blob/main/20...
> Codebase Divergence: Our engine is based on an early fork of vLLM from over a year ago. Although structurally similar, we’ve heavily customized it for DeepSeek models, making it difficult to extend for broader use cases.
I've been there. Probably a few of us have.
Their approach of working on splitting out maintainable sublibraries and sharing info directly even if not integrated seems a really nice way of working with the community -- ie, they have obstacles, but they're not letting the obstacles cause them to take the easy route of not contributing at all. And while it might seem better to someone wanting to use their techniques to share only working code, not info on the techniques, at least it's still knowledge sharing. And again I think it'd be easier for them not to do it. So kudos to them.
The fact they share back some of their improvements is great.
Why did Google published the Transformer architecture instead of keeping it to themselves?
I understand that people may want to do good things for humanity, facilitate progress, etc. But if an action goes against commercial interest, how can the company management take it and not get objections from shareholders?
Or there is a commercial logic that motivates sharing of information and intellectual property? What logic is that?
When you're an engineer at the tier of these AI researchers, winning an extra 100k/year on top of you current 500k (numbers out of my ass) is not worth it vs getting name recognition. Being known as one of the authors that made the transformer for example will enable you work with other bright minded individuals and create even better things.
So essentially these commercial companies have "we'll let you publish papers when you work for us" as a perk.
Also, instead of an extra 100k a year, you get to raise a billion dollars in VC funds for your next company
1. Goodwill and mindshare. If you're known as "the best" or "the most innovative", then you'll attract customers.
2. Talent acquisition. Smart people like working with smart people.
3. Becoming the standard. If your technology becomes widely adopted, and you've been using it the longest, then you're suddenly be the best placed in your industry to make use of the technology while everyone retools.
4. Deception. Sometimes you publish work that's "old" internally but is still state of the art. This provides your competition with a false sense of where your research actually is.
5. Freeride on others' work. Maybe experimenting with extending an idea is too expensive/risky to fund internally? Perhaps a wave of startups will try. Acquire one of them that actually makes it work.
6. Undercut the market leader. If your industry has a clear market leader, the others can use open source to cooperate to erode that leadership position.
There absolutely is a sound commercial justification to share research: long-term growth through advancement of the field. (Deep learning would never have made the progress it has without open research!)
If this seems quaint, it’s because we’re too accustomed to short-term, transactional, Wall Street thinking.
Might as well get some dubious medium term gain rather than spend a bunch of money on security for nothing.
For very good reason, because that's exactly how they behave in all other areas. The question remains, why do they appear altruistic when it comes to sharing papers?
I find it hard to believe that it's actual altruism. It's far more likely that it's transactional behavior that just appears altruistic from the outside.
Out of all of the companies in the world, I wouldn't put Google near the bottom of the list in terms of stuff they've discovered and released to the world.
I heard that Dodge v. Ford Motor Co. was an important precedent in the US. https://en.m.wikipedia.org/wiki/Dodge_v._Ford_Motor_Co.
My wikipedia link above in turn links to https://en.m.wikipedia.org/wiki/Shareholder_primacy, which says in the last paragraph: "The doctrine waned in later years."
This probably confirms what you say, but I'd be interested to learn about specific cases.
Also, lots of copyright abolitionists in AI. Many people who work in the space delight in the idea of making information, especially their own, free.
The ghost of Aaron Swartz runs through every researcher in this space.
I used to work in such a restrictive environment. Nobody worth their salt stayed long.
Because they make their money from advertisements. Not their AI models. Same for Meta.
Compare that to e.g. OpenAI who's trying to make money from their AI models, and are thus underbid by Google and Meta.
Plenty of companies are in this position.
Please just open source anyway with a note saying "we won't be maintaining this, but feel free to fork!"
Been there with AOSP, but that won't be changing anytime soon. I highly doubt noobs will learn the open source etiquette unfortunately.
Kind of like how biological information is always trying to find new places to reproduce itself. Viruses and fungi do not come with Toss and EULAs. :)
The incidence of bugs, it not understanding what you're asking or just generating code that is straight up wrong is much worse. Even with guidance it will often be unable to fix issues, leaving you to do all the manual legwork to get things working. Usually you're better off just having done everything yourself from the start.
During those two months they really improved GPT as well, its generation speed is now much much faster, and the quality of its output has become a lot better.
What type of coding are you doing? Did you locally roll your own coding assistant with a local model of DeepSeek or are you prompting via the web?
I sometimes feel guilty though. With all this power, I’m just bounded by lack of ideas and execution.