Posted by jrandolf 13 hours ago
The LLMs are completely private (we don't log any traffic).
The API is OpenAI-compatible (we run vLLM), so you just swap the base URL. Currently offering a few models.
Also mobile version is a bit broken, but good idea and good luck!
For $40, I'd get 20 tok/s * 2.6M seconds per month = 52M tokens of DeepSeek v3.2 per month if I run it 24/7, which is not realistic for most workloads.
On OpenRouter [1], $40 buys 105M tokens from the same model, which is more than 52M tokens, and I can freely choose when to use them.
> deepseek-v3.2-685b, $40/mo/slot for ~20 tok/s, 465 slots total
> 465 users × 20 tok/s = 9,300 tok/s needed
> The node peaks at ~3,000 tok/s total. So at full capacity they can really only serve:
> 3,000 ÷ 20 = 150 concurrent users at 20 tok/s
> That's only 32% of the cohort being active simultaneously.
I dig the idea! I'm curious where the costs will land with actual use.
That's over a 1000 words/s if you were typing. If 1000 words/s is too slow for your use-case, then perhaps $5/m is just not for you.
I kinda like the idea of paying $5/m for unlimited usage at the specified speed.
It beats a 10x higher speed that hits daily restrictions in about 2 hours, and weekly restrictions in 3 days.
I mean my local 122b is only 20t/s so for background stuff it can be used for that. But not for anything interactive IME.
What are you running that local 122b on? I mean, this looks attractive to me for $5/m running unlimited at 20t/s-25t/s, but if I could buy hardware to get that running locally, I don't mind doing so.