Top
Best
New

Posted by albelfio 17 hours ago

Tinybox – A powerful computer for deep learning(tinygrad.org)
527 points | 296 comments
bastawhiz 16 hours ago|
There's no way the red v2 is doing anything with a 120b parameter model. I just finished building a dual a100 ai homelab (80gb vram combined with nvlink). Similar stats otherwise. 120b only fits with very heavy quantization, enough to make the model schizophrenic in my experience. And there's no room for kv, so you'll OOM around 4k of context.

I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.

I'm also confused why this is 12U. My whole rig is 4u.

The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.

I'm glad this exists but it's... honestly pretty perplexing

oceanplexian 16 hours ago||
It will work fine but it’s not necessarily insane performance. I can run a q4 of gpt-oss-120b on my Epyc Milan box that has similar specs and get something like 30-50 Tok/sec by splitting it across RAM and GPU.

The thing that’s less useful is the 64G VRAM/128G System RAM config, even the large MoE models only need 20B for the router, the rest of the VRAM is essentially wasted (Mixing experts between VRAM and/System RAM has basically no performance benefit).

syntaxing 15 hours ago||
Split RAM and GPU impacts it more than you think. I would be surprised if the red box doesn’t outperform you by 2-3X for both PP and TG
overfeed 14 hours ago|||
> I'm also confused why this is 12U. My whole rig is 4u.

I imagine that's because they are buying a single SKU for the shell/case. I imagine their answer to your question would be: In order to keep prices low and quality high, we don't offer any customization to the server dimensions

ottah 12 hours ago||
That's just such a massively oversized server for the number of gpus. It's not like they're doing anything special either. I can buy an appropriately sized supermicro chassis myself and throw some cards in it. They're really not adding enough value add to overspend on anything.
randomgermanguy 2 hours ago||
The major selling point of the tinyboxes is that you're able to run them in your office without any hassle.

I used to own a Dell Poweredge for my home-office, but those fans even on minimal setting kept me up at night

ericd 13 hours ago|||
Was that cheaper than a Blackwell 6000?

But yeah, 4x Blackwell 6000s are ~32-36k, not sure where the other $30k is going.

bastawhiz 12 hours ago|||
I bought the A100s used for a little over $6k each.
ericd 11 hours ago||
Oh, why'd you go that route? Considering going beyond 80 gigs with nvlink or something?
segmondy 12 hours ago|||
folks have too much money than sense, gpt-oss-120b full quant runs on my quad 3090 at 100tk/sec and that's with llama.cpp, with vllm it will probably run at 150tk/sec and that's without batching.
amarshall 12 hours ago|||
You're almost certainly (definitely, in fact) confusing the 120b and 20b models.
Aurornis 10 hours ago||||
> gpt-oss-120b full quant runs on my quad 3090

A 120B model cannot fit on 4 x 24GB GPUs at full quantization.

Either you're confusing this with the 20B model, or you have 48GB modded 3090s.

ericd 11 hours ago|||
How're you fitting a model made for 80 gig cards onto a GPU with 24 gigs at full quant?
Havoc 3 hours ago|||
He said quad 3090 not single
zozbot234 11 hours ago|||
MoE layers offload to CPU inference is the easiest way, though a bit of a drag on performance
ericd 11 hours ago||
Yeah, I'd just be pretty surprised if they were getting 100 tokens/sec that way.

EDIT: Either they edited that to say "quad 3090s", or I just missed it the first time.

Aurornis 10 hours ago|||
> There's no way the red v2 is doing anything with a 120b parameter model.

I don't see the 120B claim on the page itself. Unless the page has been edited, I think it's something the submitter added.

I agree, though. The only way you're running 120B models on that device is either extreme quantization or by offloading layers to the CPU. Neither will be a good experience.

These aren't a good value buy unless you compare them to fully supported offerings from the big players.

It's going to be hard to target a market where most people know they can put together the exact same system for thousands of dollars less and have it assembled in an afternoon. RTX 6000 96GB cards are in stock at Newegg for $9000 right now which leaves almost $30,000 for the rest of the system. Even with today's RAM prices it's not hard to do better than that CPU and 256GB of RAM when you have a $30,000 budget.

zozbot234 16 hours ago|||
> And there's no room for kv, so you'll OOM around 4k of context.

Can't you offload KV to system RAM, or even storage? It would make it possible to run with longer contexts, even with some overhead. AIUI, local AI frameworks include support for caching some of the KV in VRAM, using a LRU policy, so the overhead would be tolerable.

tcdent 16 hours ago|||
Not worth it. It is a very significant performance hit.

With that said, people are trying to extend VRAM into system RAM or even NVMe storage, but as soon as you hit the PCI bus with the high bandwidth layers like KV cache, you eliminate a lot of the performance benefit that you get from having fast memory near the GPU die.

zozbot234 14 hours ago||
> With that said, people are trying to extend VRAM into system RAM or even NVMe storage

Only useful for prefill (given the usual discrete-GPU setup; iGPU/APU/unified memory is different and can basically be treated as VRAM-only, though a bit slower) since the PCIe bus becomes a severe bottleneck otherwise as soon as you offload more than a tiny fraction of the memory workload to system memory/NVMe. For decode, you're better off running entire layers (including expert layers) on the CPU, which local AI frameworks support out of the box. (CPU-run layers can in turn offload to storage for model parameters/KV cache as a last resort. But if you offload too much to storage (insufficient RAM cache) that then dominates the overhead and basically everything else becomes irrelevant.)"

bastawhiz 12 hours ago||||
The performance already isn't spectacular with it running all in vram. It'll obviously depend on the model: MoE will probably perform better than a dense model, and anything with reasoning is going to take _forever_ to even start beginning its actual output.
ranger_danger 16 hours ago|||
I know llama.cpp can, it certainly improved performance on my RAM-starved GPU.
ottah 12 hours ago||
Honestly two rtx 8000s would probably have a better return on investment than the red v2. I have an eight gpu server, five rtx 8000, three rtx 6000 ada. For basic inference, the 8000s aren't bad at all. I'm sure the green with four rtx pro 6000s are dramatically faster, but there's a $25k markup I don't honestly understand.
ivraatiems 16 hours ago||
There's some irony in the fact that this website reads as extremely NOT AI-generated, very human in the way it's designed and the tone of its writing.

Still, this is a great idea, and one I hope takes off. I think there's a good argument that the future of AI is in locally-trained models for everyone, rather than relying on a big company's own model.

One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.

solarkraft 14 hours ago||
I find that the most respected writing about AI has very few signs of being written by AI. I'm guessing that's because people in the space are very sensitive to the signs and signal vs. noise.
rimeice 12 hours ago|||
And because people writing anything worth reading are using the process of writing to form a proper argument and develop their ideas. It’s just not possible to do that by delegating even a small chunk of the work to AI.
Aperocky 14 hours ago|||
I found it useful to preface with

* this section written by me typing on keyboard *

* this section produced by AI *

And usually both exist in document and lengthy communications. This gets what I wanted across with exactly my intention and then I can attach 10x length worth of AI appendix that would be helpful indexing and references.

jolmg 11 hours ago||
> attach 10x length worth of AI appendix that would be helpful indexing and references.

Are references helpful when they're generated? The reader could've generated them themselves. References would be helpful if they were personal references of stuff you actually read and curated. The value then would be getting your taste. References from an AI may well be good-looking nonsense.

cgio 6 hours ago||
I agree wholeheartedly, I don’t see any balance in the effort someone dedicated to generating text vs me consuming it. If you feel there’s further insight to be gained by an llm, give me the prompt, not the output. Any communication channel reflects a balance of information content flowing and we are still adjusting to the proper etiquette.
jofzar 11 hours ago|||
Good? That's what I want out of all websites. I don't want to read what an AI believes is the best thing for a website, I want to know the honest truth.
agnishom 8 hours ago|||
I don't view this as irony. This seems like good sense in understanding when AI usage will make things better and when it will not.
Lerc 16 hours ago|||
I am a little surprised that they openly solicit code contributions with "Invest with your PRs" but don't have any statement on AI contributions.

Maybe the volume for them is ok that well-intentioned but poor quality PRs can be politely(or otherwise, culture depending) disregarded and the method of generation is not important.

KeplerBoy 15 hours ago|||
Tinygrad sure shared a few opinions on AI PRs on Twitter. I believe the gist was "we have Claude code as well, if that's all you bring don't bother".
all2 9 hours ago||
That's a pretty excellent take, IMO. Just an undirected AI model doesn't do much, especially when the core team has time with the code, domain expertise, _and_ Claude.
cyanydeez 15 hours ago|||
I'm starting to think that if you have an AI repo thats basically about codegen, you should just close all issues automatically, the manually (or whatever) open the ones you/maintainers actually care about. Thats about the only way to kill some of the signal/noise ratio AIs are creating.

Then you could focus fire, like the script kiddies did with DDoS in the old days on fixing whatever preferred issues you have.

wat10000 16 hours ago|||
If you’re spending $65,000 on this thing, needing two circuits seems like a minor problem
ycui1986 12 hours ago|||
they could had gone with the Max-Q version RTX PRO 6000 and only require 120V circuit. 10% performance hit, but half the power.

fundamentally, looks like they are shipping consumer off-the-shelf hardwares in a custom box.

ericd 11 hours ago||
Yeah, the other big benefit is that the Max-Q's have blowers that exhaust the hot air out of the box, the workstation cards would each blow their exhaust straight into the intake of the card behind it. The last card in that chain would be cooking, as the air has already been heated up by 1800W, essentially a hair dryer on high.

Or could be the server edition 6000s that just have a heatsink and rely on the case to drive air through them, those are 600W cards.

ivraatiems 15 hours ago|||
The $12,000 one also requires it.
knollimar 13 hours ago|||
Easier to get two circuits than rewire a breaker in an office you might be renting, no?

(I work for an electrical contractor so my sense of ease might be overcorrecting)

markdown 11 hours ago||
And 240v is orders of magnitude more common worldwide than 120v
wat10000 13 hours ago||||
The specs show that it only has one PSU. The docs just say that it has 2 and thus needs two circuits, but I’d guess that was meant to be for the more expensive one.
isatty 15 hours ago|||
Surprisingly affordable but I’m not really interested in the 9070XT.

If it shipped with like 4090+ (for a higher price) it’d be more tempting.

dmarcos 15 hours ago|||
They offered a version a few months ago with 4x5090 for 25k

https://x.com/__tinygrad__/status/1983917797781426511

Stopped due to raising GPU prices:

https://x.com/__tinygrad__/status/2011263292753526978

ycui1986 12 hours ago|||
9070XT provide roughly same inference performance at double the power, half the cost, as RTX PRO 4500. So this one is optimized for total BOM cost.
imjustmsk 9 hours ago|||
Big companies are pushing cloud really hard, and yea the hardware prices too is a problem. People still buy Google cloud and OneDrive when they could literally pickup an old computer from trash and Frankenstein it into a NAS server.
harvey9 6 hours ago|||
If I'm spending at least 12k USD on the machine then doing some electrical works to accommodate it is not a big deal.
adrianwaj 11 hours ago|||
"locally-trained models for everyone"

Wouldn't there be a massive duplication of effort in that case? It'll be interesting to see how the costs play out. There are security benefits to think about as well in keeping things local-first.

all2 9 hours ago||
There are multiple efforts for 'folding at home' but for AI models at this point. I get the impression that we will see a frontier model released this year built on a system like this.
kube-system 8 hours ago|||
When you’re dealing with this kind of power it’s easier just to colocate where you’ll typically get two separate feeds of 208v
nutjob2 4 hours ago|||
3200W at ~240V is ~15A, that's just a regular household socket, at least in Europe. I imagine 240V sockets in the US are at least 15A.

No need for separate circuits, just use a double adapter.

trollbridge 16 hours ago|||
A typical U.S. 240V circuit is actually just two 120V circuits. Fairly trivial to rewire for that.
Salgat 14 hours ago|||
It's more accurate to say that the typical 120V circuit is just a 240V source with the neutral tapped into the midpoint of the transformer winding.
reactordev 14 hours ago||
This. It definitely comes in at a higher voltage.
amluto 10 hours ago||
Sort of? It’s 120V RMS to ground.
razingeden 9 hours ago||
yes, this is accurate for US and “works” but it’s against code here. you’ll get mildly shocked by metallic cabinets and fixtures especially if you’re barefoot and become the new shortest path to ground.

old construction in the US sometimes did this intentionally (so old, the house didn’t have grounds. Or to “pass” an inspection and sell a place) but if a licensed electrician sees this they have to fix it.

I’m dealing with a 75 year old house that’s set up this way, the primary issue this is causing is that a 50amp circuit for their HVACs are taking a shorter path to ground inside the house instead of in the panel.

As a result the 50 amp circuit has blown through several of the common 20amp grounds and neutrals and left them with dead light fixtures and outlets because they’re bridged all over the place.

If an HVAC or two does this, I’d advise against this for your 3200 watt AI rig.

EU, you don’t want to try to energize your ground. They use step down transformers or power supplies capable of taking 115-250 (their systems are 240-250V across the load and neutral lines. Not 120 across the load and neutral like ours.)

in the US. you’re talking about energizing your ground plane with 120v and I don’t want to call that safe… but it’s REALLY NOT SAFE to make yourself the shortest path to ground on say. a wet bathroom floor. with 220V-250v.

projektfu 7 hours ago||||
Yes, if you have a 240V US split-phase circuit you could make a little sub-panel with a 40A breaker feeding two 20A 120V circuits and plug the two power supplies into each side. (1600W would need a 20-A breaker because 13.3A would be too much of a 15A circuit). But it would probably make more sense to just plug them both into the same 40A 240V circuit. If you use NEMA 6-20, make sure you label it appropriately and probably color it red.

In Europe, you could plug the two power supplies into an appropriately sized 240V circuit.

In an apartment you can't rewire, you could set it up in your kitchen, which in the modern US code should have two separate 20A circuits. You will need to put it to sleep while you use appliances.

razingeden 9 hours ago||||
A US circuit is.

But this is re: European 240/250 which is 240 between its load and neutral

I’d say don’t energize either systems ground plane, but , really, don’t do this in EU

0xbadcafebee 12 hours ago||||
I think you're forgetting the wires? If you have one outlet with a 15-20A 120V circuit, then the wiring is almost certainly rated for 15-20A. If you just "combined" two 120V circuits into a 240V circuit, you still need an outlet that is rated for 30A, the wires leading to it also need to be rated for 30A, and it definitely needs a neutral. So you still need a new wire run if you don't have two 120V circuits right where you wanna plug in the box. To pass code you also may need to upsize conduit. If load is continuously near peak, it should be 50A instead of 30.

So basically you need a brand new circuit run if you don't have two 120V circuits next to each other. But if you're spending $65k on a single machine, an extra grand for an electrician to run conduit should be peanuts. While you're at it I would def add a whole-home GFCI, lightning/EMI arrestor, and a UPS at the outlet, so one big shock doesn't send $65k down the toilet.

briandw 11 hours ago|||
Correct me if I’m wrong, but doubling the volts doesn't change the amps, it doubles the watts. Watts = V*A.
0xbadcafebee 6 hours ago|||
Yes; I assumed 30A was minimum requirement for 240V service in US. Apparently I was wrong, 20A 240V is apparently normal. So in theory you could use a pre-existing 20A 120V circuit's wiring for a 240V (assuming it was 12/3 cable). And apparently 4-wire is now the standard for 240V service in US? Jesus we have a weird grid.
subscribed 11 hours ago|||
Doubling the volts halves the amps. P = I * V indeed.
fc417fc802 12 hours ago|||
I think you might've misread GP. (Or maybe I did?)

He's not saying you would use it as two separate 120v circuits sharing a ground but rather as a single 240v circuit. His point is that it's easy to rewire for 240v since it's the same as all the other wiring in your house just with both poles exposed.

Of course you do have to run a new wire rather than repurpose what's already in the wall since you need the entire circuit to yourself. So I think it's not as trivial as he's making out.

But then at that wattage you'll also want to punch an exhaust fan in for waste heat so it's not like you won't already be making some modifications.

projektfu 7 hours ago||
The wiring (at least in the US) to the 120V outlets is just one half of the split-phase 240V. If you want to send 240V down a particular wire, you can do that, by changing the breaker, but then you lose the neutral. You also make the wires dangerous to people who don't realize that the white wire is now energized at 120V over ground. (Though it's best to test to be sure anyway, as polarity gets reversed by accident, etc.) Live wires should be black or red.
doubled112 16 hours ago||||
I’ve actually had half of my dryer outlet fail when half of the breaker failed.

Can confirm.

amluto 15 hours ago||||
Sometimes. 240V circuits may or may not have a neutral.
jcgrillo 15 hours ago|||
If you actually use two 120V circuits that way and one breaker flips the other half will send 120V through the load back into the other circuit. So while that circuit's breaker is flipped it is still live. Very bad. Much better to use a 240V breaker that picks up two rails in the panel.
HWR_14 4 hours ago|||
They make connected circuit breakers for this use case, where one tripping automatically trips both.
amluto 10 hours ago||||
I assume the device has two separate PSUs, each of which accepts 120-240V, and neither of which will backfeed its supply.
ycui1986 12 hours ago|||
i am guessing, without any proof, that, when one breaker fails the server lose it all, or loose two GPUs, depending on whether one connected to the cpu side failed.
fc417fc802 11 hours ago||
GPUs aren't electrically isolated from the motherboard though. An entire computer is a single unified power domain.

The only place where there's isolation is stuff like USB ports to avoid dangerous ground loop currents.

That said I believe the PSU itself provides full isolation and won't backfeed so using two on separate circuits should (maybe?) be safe. Although if one circuit tripped the other PSU would immediately be way over capacity. Hopefully that doesn't cause an extended brownout before the second one disables itself.

aiiizzz 5 hours ago||
Why is hn so obsessed Scott whether something is _written_ by ai or not? Who cares? Judge content, not form.

Oh wait, I get it, it's bike shedding.

dddgghhbbfblk 5 hours ago||
I've been seeing variations on your comment a lot on HN lately and I find it a rather vapid way of looking at something so intricate as human communication. Among other things, the medium is the message!
vessenes 16 hours ago||
The exabox is interesting. I wonder who the customer is; after watching the Vera Rubin launch, I cannot imagine deciding I wanted to compete with NVIDIA for hyperscale business right now. Maybe it’s aiming at a value-conscious buyer? Maybe it’s a sensible buy for a (relatively) cash-strapped ML startup; actually I just checked prices, and it looks like Vera Rubin costs half for a similar amount of GPU RAM. I’m certain that the interconnect will not be as good as NV’s.

I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.

kulahan 15 hours ago||
Sometimes you can compete with the big boys simply because they built their infra 5+ years ago and it’s not economically viable for them to upgrade yet, because it’s a multi-billion dollar process for them. They can run a deficit to run you out of the business, but if you’re taking less than 0.01% of their business, I doubt they’d give a crap.
zozbot234 16 hours ago||
> The exabox is interesting.

Can it run Crysis?

WithinReason 15 hours ago|||
Only gamers understand that reference

-- Jensen Huang

zargon 7 hours ago||
*Only gamers know that joke.
bastawhiz 16 hours ago||||
Probably, the rdna5 can do graphics. But it would be a huge waste, since you could probably only use one of the 720 GPUs
dist-epoch 15 hours ago|||
Yes, it can generate Crysis with diffusion models at 60 fps.
paxys 14 hours ago||
The problem with all these "AI box" startups is that the product is too expensive for hobbyists, and companies that need to run workloads at scale can always build their own servers and racks and save on the markup (which is substantial). Unless someone can figure out how to get cheaper GPUs & RAM there is really no margin left to squeeze out.
nine_k 13 hours ago||
Would a hedge fund that does not want to trust to a public AI cloud just buy chassis, mobos, GPUs, etc, and build an equivalent themselves? I suspect they value their time differently.
paxys 9 hours ago||
Why do you think a hedge fund can't hire a couple of IT guys? Most of the larger ones have technical operations that would put big tech to shame.
ViscountPenguin 4 hours ago|||
Medium sized hedge funds are a good portion of the market, and only really want to hire just enough tech people to keep the quant pipelines running.
signal_v1 5 hours ago|||
[dead]
qubex 6 hours ago|||
They’re kickstarting a TINY device that is pocketable and aimed at consumers. I’ve backed it (full disclosure).
jgrizou 1 hour ago|||
https://www.kickstarter.com/projects/tiinyai/tiiny-ai-pocket...
ankaz 47 minutes ago|||
[dead]
kkralev 13 hours ago||
i think the real gap isnt at the high end tho. theres a whole segment of people who just want to run a 7-8b model locally for personal use without dealing with cloud APIs or sending their data somewhere. you dont need 4 GPUs for that, a jetson or even a mini pc with decent RAM handles it fine. the $12k+ market feels like it's chasing a different customer than the one who actually cares about offline/private AI
wmf 12 hours ago||
just want to run a 7-8b model locally

This is already solved by running LM Studio on a normal computer.

zozbot234 12 hours ago||
Ollama or llama.cpp are also common alternatives. But a 8B model isn't going to have much real-world knowledge or be highly reliable for agentic workloads, so it makes sense that people will want more than that.
zach_vantio 10 hours ago||
the compute density is insane. but giving a 70B model actual write access locally for agentic workloads is a massive liability. they still hallucinate too much. raw compute without strict state control is basically just a blast radius waiting to happen.
alexfromapex 12 hours ago||
$12,000 for the base model is insane. I have an Apple M3 Max with 128GB RAM that can run 120B parameter models using like 80 watts of electricity at about 15-20 tokens/sec. It's not amazing for 120B parameter models but it's also not 12 grand.
Thaxll 12 hours ago||
M3 max tflops is tiny compared to the 12k box. It's not even comparable.
davej 7 hours ago|||
It is very comparable if you work out the $/tok/s on inference. I did some napkin math and it looks like you’re getting roughly 3x the performance for 3x the cost. Red v2 vs Mac Studio M3 Ultra 96GB.

If you compare tokens/kWh efficiency then my math has Mac Studio being about 1.5x more efficient.

zozbot234 12 hours ago|||
M3 has tolerable decode performance for the price, and that's what people would care about most of the time. they underperform severely wrt. prefill, but that's a fraction of the workload. AI, even agentic AI, spends most of its time outputing tokens, not processing context in bulk.
segmondy 12 hours ago||
it's for fools. i bought 160gb of vram for $1000 last year. 96gb of p40 VRAM can be had for under $1000. And it will run gpt-oss-120b Q8 at probably 30tk/sec
timschmidt 12 hours ago||
P40 is Tesla architecture which is no longer receiving driver or CUDA updates. And only available as used hardware. Fine for hobbyists, startups, and home labs, but there is likely a growing market of businesses too large to depend on used gear from ebay, but too small for a full rack solution from Nvidia. Seems like that's who they're targeting.
segmondy 12 hours ago||
99% of interest is in inference. If you want to fine-tune a model, just rent the best gpu in the cloud. It's often cheaper and faster.
timschmidt 11 hours ago||
Great option if you don't mind sharing your data with the cloud. Some businesses want to own the hardware their data resides on.
cootsnuck 11 hours ago|||
How many businesses have the capabilities and expertise to train their own models?
timschmidt 11 hours ago||
No idea. Probably more every day.
segmondy 11 hours ago|||
renting GPU, how is that sharing data with the cloud? you can rent GPU from GCP or AWS
timschmidt 10 hours ago||
I suppose if I rent a cloud GPU and just let it sit there dark and do nothing then I wouldn't have to move any data to it. Otherwise, I'm uploading some kind of work for it to do. And that usually involves some data to operate on. Even if it's just prompts.
roarcher 10 hours ago||
> In order to keep prices low and quality high, we don't offer any customization to the box or ordering process. If you aren't capable of ordering through the website, I'm sorry but we won't be able to help.

Has this guy never worked on a B2B product before? Nobody is going to order a $10 million piece of infrastructure through your website's order form. And they are definitely going to want to negotiate something, even if it's just a warranty. And you'll do it because they're waving a $10 million check in your face.

The tone of this website is arrogant to the point of being almost hostile. The guy behind this seems to think that his name carries enough weight to dictate terms like this, among other things like requiring candidates to have already contributed to his product to even be considered for a job. I would be extremely surprised if anyone except him thinks he's that important.

codemog 5 hours ago||
I haven’t seen tinygrad used for any mainstream production project or thing of value, yet.

Besides a lot of self congratulatory pats on the back for how elegant it is. Honestly, when I read it, it looked confusing as all the other ML libraries. Not actually simple like Karpathy’s stuff.

All that to say, I do really want it to succeed. They should probably hire some practical engineers and not just guys and gals congratulating themselves how elegant and awesome they are.

jen729w 10 hours ago|||
Your framing of this section is misleading. On the site it's preceded by a FAQ-style 'question':

> Can you fill out this supplier onboarding form?

That's very important context, as anyone who has been asked to fill out a supplier onboarding form (hi) will attest.

roarcher 9 hours ago||
Filling out an onboarding form is an example of what he's not willing to do, not the only thing he isn't willing to do.

> we don't offer any customization to the box or ordering process

Every B2B deal of that size that I've ever seen requires at least weeks of meetings between the customer and vendor, in which every detail is at least discussed if not negotiated. That would certainly constitute a "customization" to this guy's prescribed ordering process, which is to "Buy it now" [1] through the website at the stated price like you're ordering a jar of peanuts on Amazon. This is not "framing", it's what the guy said. If it isn't what he meant then he needs to fix his copy.

[1] Yes, there is an actual "Buy it now" button for a $65,000 business purchase that takes you to a page that looks just like a Stripe form. There isn't even a textbox for delivery instructions. Wild.

awesomeMilou 8 hours ago||
Then if they succeed, I guess you're going to see a different process for the first time in your life.

On a website where we frequently talk about disruptive business models, this whole attitude kinda stinks.

roarcher 7 hours ago|||
> Then if they succeed, I guess you're going to see a different process for the first time in your life.

Sure, I guess. Far more likely that they won't succeed, and it will be because of their pointless refusal to cooperate with others. I'm curious why you think we should "disrupt" companies putting a little due diligence into massive purchases.

> On a website where we frequently talk about disruptive business models, this whole attitude kinda stinks.

I could say the same thing about making a comment like this on a website where groupthink is rightfully mocked.

pegasus 5 hours ago|||
> you're going to see a different process for the first time in your life

That sounds very neutral, but wouldn't this, by removing the human element and flexibility from business transactions, be a further step along a general enshittification trend?

phrotoma 4 hours ago|||
> arrogant to the point of being almost hostile

First encounter with geohot eh?

HWR_14 4 hours ago|||
There isn't a $10MM device right now, just $64M and under. I doubt the order process will remain the same in 12 months when the $10MM device becomes available
wmf 10 hours ago|||
He's not actually selling the exabox yet. It sounds like he put up a hypothetical config to see if anyone is interested.
Havoc 3 hours ago|||
> arrogant to the point of being almost hostile.

The YouTube rap video of geohotz telling Sony lawyers suing him to blow him is still up.

His style of dealing with corporate matters is certainly unconventional

kube-system 8 hours ago|||
The specs for the “exabox” scream “this is a joke” to me.

> 20,000 lbs

> concrete slab

Huge-scale IT systems are typically delivered in one or more 42/44u cabinets, and are designed to be installed on raised floors.

0xbadcafebee 6 hours ago|||
It's a shipping container. Look at the dimensions. They say concrete slab probably half as a joke, half because building code would require it to consider it a non-temporary structure.
wmf 8 hours ago||||
It's a shipping container that you install outdoors.
kube-system 8 hours ago||
Are you referring to the images of branded shipping containers on their Twitter page that have visible Gemini watermarks … and jokes in the comments about AI trailer parks?
wmf 8 hours ago||
20x8x8.5 ft is the dimensions of a half shipping container. You think that render is a joke but it's not. They don't have photos yet because it's a 2027 product (if it actually comes out which I would bet against).
roarcher 8 hours ago|||
It's also funny that they explicitly list driver quality as "good" for the base option and "great" for the intermediate one. You're really going to deliberately provide worse drivers for the machine I paid you for, just because I didn't buy the more expensive one?

I mean I'm sure lots of companies do this in practice because tickets for higher-paying customers naturally get prioritized, but directly stating your intention to do it on your home page is hilarious.

wmf 8 hours ago|||
Nvidia drivers are better than AMD. It's not really something they have control over. Geohot is definitely obsessed with bitching about driver bugs though.
roarcher 8 hours ago||
That may be, but then it's an inside joke that many of his customers won't get. It just looks like a "fuck you" to anyone buying the cheaper system.

This guy desperately needs a marketing intern to look over his copy. Or hell, anyone who knows how to talk to humans.

fwipsy 8 hours ago||
Not a joke. It's just true.
roarcher 8 hours ago||
It doesn't matter if it's a joke. The non-technical manager or VP making this purchase will not understand it and will expect poor treatment from this vendor, an expectation that will be reinforced by numerous other things on this page. There is no reason to include it at all.
kube-system 8 hours ago|||
It doesn’t read as if they actually care about broad appeal, given their plain refusal to accommodate traditional procurement processes
pegasus 4 hours ago||
So they're only interested in taking on customers who are OK with being treated poorly?
vkazanov 6 hours ago|||
It seems that you work a lot with managers who have no clue what they are buying and why.

I mean, you're not wrong: buying enterprise software from Oracle or Microsoft or Salesforce is pure pain.

But nobody expects buying niche hardware from a tiny vendor to involve the usual 128 pre/post sale meetings and 256 hours of professional services.

Also, relevant VP buying these things usually do understand the difference between AMD and Nvidia stacks really well. Like, really-really well.

roarcher 6 hours ago||
> It seems that you work a lot with managers who have no clue what they are buying and why.

There are certain quirks of this platform's user base that always make me laugh. For example, HNers absolutely love to imply something condescending about the other guy's workplace in order to make their point.

Watch this, I can do it too: Working with managers who make $65,000 (or $10 million) purchases with no more due diligence than reading a marketing page and clicking "Buy it now" is not the flex you think it is.

vkazanov 4 hours ago||
I was involved in it-related deals on both purchasing and selling sides. Sums involved were larger than both numbers you mentioned.

And I honestly see almost no correlation between the amount of negotiation involved, and value received.

Some of the most useful things we've integrated were either free or meant that only the "buy it now" button had to be clicked.

Some of the absolutely worst systems I had to work with were purchased after making a call to that "let us know" number.

This tiny guy is mostly saying that he doesnt have the time for enterprise bla-bla. I am not sure he can organise enterprise sales with this attitude but can definitely relate to it!

kube-system 8 hours ago||||
I took that as a dig against AMD vs Nvidia driver quality.
zekrioca 8 hours ago|||
I guess it is called ‘honesty’.
jrflowers 10 hours ago||
I imagine that the FAQ might get updated when there’s actually a $10M machine for sale
roarcher 10 hours ago||
Maybe. Frankly I'd be very surprised if any business ordered a $65k machine that way either.
jrflowers 9 hours ago||
Yeah it’s a little odd. Maybe they are meant to be really really cool toys? People regularly spend more than $65k on things like cars to show off, so it could be like that.

I have no use for these but I might buy one anyway if I won the lottery. ¯\_(ツ)_/¯

siliconc0w 14 hours ago||
Tinybox is cool but I think the market is maybe looking more for a turn-key explicit promise of some level of intelligence @ a certain Tok/s like "Kimi 2.5 at 50Tok/s".
hmokiguess 13 hours ago||
Is this like the new equivalent of crypto mining? I remember the early days when they would sell hardware for farming crypto, now it’s AI?
latchkey 13 hours ago|
Kind of yes, except there is no block reward.
barnabee 1 hour ago||
The block reward is firing humans and collecting ad revenue for slop
mellosouls 10 hours ago||
Where is the 120B documented? This seems to be an editorialized title.

Edit: found a third party referencing the claim but it doesn't belong in the title here I think:

Meet the World’s Smallest ‘Supercomputer’ from Tiiny AI; A Machine Bold Enough to Run 120B AI Models Right in the Palm of Your Hand

https://wccftech.com/meet-the-worlds-smallest-supercomputer-...

Aurornis 10 hours ago|
That third party link is from a different company (Tiiny with an extra i)

Now I'm wondering if the HN title was submitted by some AI bot that couldn't tell the difference.

mellosouls 6 hours ago|||
Ha, good catch, I googled for Tinybox 120B and clearly didn't read the article beyond the seeming match.
adrianwaj 14 hours ago|
Perhaps this company should think about acting as a landlord for their hardware. You buy (or lease) but they also offer colocation hosting. They could partner with crypto miners who are transitioning to AI factories to find the space and power to do this. I wonder if the machines require added cooling, though, in what would otherwise be a crypto mining center. CoreWeave made the transition and also do colocation. The switchover is real.

I think Tinygrad should think about recycling. Are they planning ahead in this regard? Is anyone? My thought is if there was a central database of who own what and where, at least when the recycling tech become available, people will know where to source their specific trash (and even pay for it.) Having a database like that in the first place could even fuel the industry.

More comments...