Show HN: sllm – Split a GPU node with other developers, unlimited tokens

Posted by jrandolf 13 hours ago

Show HN: sllm – Split a GPU node with other developers, unlimited tokens(sllm.cloud)

Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s. sllm lets you join a cohort of developers sharing a dedicated node. You reserve a spot with your card, and nobody is charged until the cohort fills. Prices start at $5/mo for smaller models.

The LLMs are completely private (we don't log any traffic).

The API is OpenAI-compatible (we run vLLM), so you just swap the base URL. Currently offering a few models.

110 points | 62 commentspage 2

artificialprint 8 hours ago|

Didn't make sense to launch multiple 10 and 40 bucks subscriptions right at the start, because now they are competing with each other.

Also mobile version is a bit broken, but good idea and good luck!

jrandolf 8 hours ago|

I'm feeling it Mr. Crabs.

tensor-fusion 10 hours ago||

Interesting direction. One adjacent pattern we've been working on is a bit less about partitioning a shared node for more tokens, and more about letting developers keep a local workflow while attaching to an existing remote GPU via a share link / CLI / VS Code path. In labs and small teams we've found the pain is often not just allocation, but getting access into the everyday workflow without moving code + environment into a full remote VM flow. Curious whether your users mostly want higher GPU utilization, or whether they also want workflow portability from laptops and homelabs. I'm involved with GPUGo / TensorFusion, so that's the lens I'm looking through.

scottcha 10 hours ago||

Pretty cool idea, but whats the stack behind this? As 15-25 tok/s seems a bit low as expected SoA for most providers is around 60 tok/s and quality of life dramatically improves above that.

trvz 8 hours ago||

[flagged]

copperx 7 hours ago|

There's a big difference between non-compliant, illegal, and criminal.

IanCal 9 hours ago||

Can you explain the benefits over something like openrouter?

jrandolf 9 hours ago|

24/7 LLM for $10/month.

johndough 7 hours ago||

Isn't this a bad deal? Or is there an error in my math?

For $40, I'd get 20 tok/s * 2.6M seconds per month = 52M tokens of DeepSeek v3.2 per month if I run it 24/7, which is not realistic for most workloads.

On OpenRouter [1], $40 buys 105M tokens from the same model, which is more than 52M tokens, and I can freely choose when to use them.

[1]: https://openrouter.ai/deepseek/deepseek-v3.2

jrandolf 7 hours ago|||

20 tok/s is an average. It can be more, it can be less. If you are running off-peak I'm sure you'd get some crazy number.

gravypod 1 hour ago||

Why wouldn't developers just do llm arbitrage against openrouter if it is a better deal?

jrandolf 1 hour ago||

The problem is different. OpenRouter is a router to LLMs. It doesn't solve GPU underutilization.

gravypod 5 minutes ago||

What I am saying is if your system lets me pay $x/token and open router lets me pay $y/token if x<y then someone could make money just by providing those tokens through the open router API. That would either drive up demand for your systems increasing costs or drive up supply on open router decreasing costs. Eventually the costs would converge, no?

spuz 12 hours ago||

It seems crazy to me that the "Join" button does not have a price on it and yet clicking it simply forwards you to a Stripe page again with no price information on it. How am I supposed to know how much I'm about to be charged?

jrandolf 12 hours ago|

That was an error on our part lol. We'll update with the price.

MuffinFlavored 8 hours ago||

> Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s.

> deepseek-v3.2-685b, $40/mo/slot for ~20 tok/s, 465 slots total

> 465 users × 20 tok/s = 9,300 tok/s needed

> The node peaks at ~3,000 tok/s total. So at full capacity they can really only serve:

> 3,000 ÷ 20 = 150 concurrent users at 20 tok/s

> That's only 32% of the cohort being active simultaneously.

artificialprint 8 hours ago|

People work 8 hours a day presumably, I guess they are banking on this idea

ycui1986 1 hour ago||

only works if the users are evenly distributed around the globe (which is likely more of less the case). if the user concentrates in on century, the token rate will be terrible.

spuz 12 hours ago||

Is this not a more restricted version of OpenRouter? With OpenRouter you pay for credits that can be used to run any commercial or open-source model and you only pay for what you use.

jrandolf 12 hours ago|

OpenRouter is a little different. We are trying to experiment with maximizing a single GPU cluster.

Lalabadie 11 hours ago||

This is the most "Prompted ourselves a Shadcn UI" page I've seen in a while lol

I dig the idea! I'm curious where the costs will land with actual use.

jrandolf 11 hours ago|

Thanks lol. I actually like Shadcn's style. It's sad that people view it as AI now.

singpolyma3 12 hours ago|

25 t/s is barely usable. Maybe for a background runner

lelanthran 11 hours ago|

> 25 t/s is barely usable. Maybe for a background runner

That's over a 1000 words/s if you were typing. If 1000 words/s is too slow for your use-case, then perhaps $5/m is just not for you.

I kinda like the idea of paying $5/m for unlimited usage at the specified speed.

It beats a 10x higher speed that hits daily restrictions in about 2 hours, and weekly restrictions in 3 days.

singpolyma3 10 hours ago||

Sure if it was just a matter of typing. But in practise it means sitting and staring for minutes at nothing happening with a "thinking" until something finally happens.

I mean my local 122b is only 20t/s so for background stuff it can be used for that. But not for anything interactive IME.

lelanthran 7 hours ago||

> I mean my local 122b is only 20t/s so for background stuff it can be used for that. But not for anything interactive IME.

What are you running that local 122b on? I mean, this looks attractive to me for $5/m running unlimited at 20t/s-25t/s, but if I could buy hardware to get that running locally, I don't mind doing so.

singpolyma3 5 hours ago||

Framework desktop

More comments...