Don't rent the cloud, own instead

Posted by Torq_boi 2 days ago

Don't rent the cloud, own instead(blog.comma.ai)

1177 points | 492 commentspage 5

evertheylen 2 days ago|

> Maintaining a data center is much more about solving real-world challenges. The cloud requires expertise in company-specific APIs and billing systems. A data center requires knowledge of Watts, bits, and FLOPs. I know which one I rather think about.

I find this to be applicable on a smaller scale too! I'd rather setup and debug a beefy Linux VPS via SSH than fiddle with various propietary cloud APIs/interfaces. Doesn't go as low-level as Watts, bits and FLOPs but I still consider knowledge about Linux more valuable than knowing which Azure knobs to turn.

0xbadcafebee 1 day ago||

  If your business relies on compute, and you run that compute in the cloud, you are putting a lot of trust in your cloud provider. Cloud companies generally make onboarding very easy, and offboarding very difficult. If you are not vigilant you will sleepwalk into a situation of high cloud costs and no way out. If you want to control your own destiny, you must run your own compute.

This is not a valid reason for running your own datacenter, or running your own server.

  Self-reliance is great, but there are other benefits to running your own compute. It inspires good engineering. Maintaining a data center is much more about solving real-world challenges. The cloud requires expertise in company-specific APIs and billing systems. A data center requires knowledge of Watts, bits, and FLOPs. I know which one I rather think about.

This is not a valid reason for running your own datacenter, or running your own server.

  Avoiding the cloud for ML also creates better incentives for engineers. Engineers generally want to improve things. In ML many problems go away by just using more compute. In the cloud that means improvements are just a budget increase away. This locks you into inefficient and expensive solutions. Instead, when all you have available is your current compute, the quickest improvements are usually speeding up your code, or fixing fundamental issues.

This is not a valid reason for owning a datacenter, or running your own server.

  Finally there’s cost, owning a data center can be far cheaper than renting in the cloud. Especially if your compute or storage needs are fairly consistent, which tends to be true if you are in the business of training or running models. In comma’s case I estimate we’ve spent ~5M on our data center, and we would have spent 25M+ had we done the same things in the cloud.

This is one of only two valid reasons for owning a datacenter, and one of several valid reasons for running your own server.

The only two valid reasons to build/operate a datacenter: 1) what you're doing is so costly that building your own factory is the only profitable way for your business to produce its widgets, 2) you can't find a datacenter with the location or capacity you need and there is no other way to serve your business needs.

There's many valid reasons to run your own servers (colo), although most people will not run into them in a business setting.

MagicMoonlight 1 day ago||

For ML it makes sense, because you’re using so much compute that renting it is just burning money.

For most businesses, it’s a false economy. Hardware is cheap, but having proper redundancy and multiple sites isn’t. Having a 24/7 team available to respond to issues isn’t.

What happens if their data centre loses power? What if it burns down?

swordsith 1 day ago||

Recently learned about tailscale and have been accessing my project from my phone, It's been a game changer. The fact that they support teams of up to 3 people and 100 devices on the free plan is awesome imo. Running locally just makes me feel so much more comfortable.

yawnxyz 10 hours ago||

running your own ai inference is quite stressful, and reading this article definitely makes me feel stressed

imcritic 1 day ago||

I love articles like this and companies with this kind of openness. Mad respect to them for this article and for sharing software solutions!

satvikpendem 2 days ago||

I just read about Railway doing something similar, sadly their prices are still high compared to other bare metal providers and even VPS such as Hetzner with Dokploy, very similar feature set yet for the same 5 dollars you get way more CPU, storage and RAM.

https://blog.railway.com/p/launch-week-02-welcome

dist-epoch 2 days ago|

Their pricing page is so confusing: CPU: $0.00000772 per vCPU / sec

This seems to imply $40 / month for 2 vCPU which seems very high?

Or maybe they mean "used" CPU versus idle?

Neil44 2 days ago||

Billing per used or not idle cpu cycle would be quite interesting. Number of cores would just effectively be your cost cap. Efficiency would be even more important. And if the provider over subscribes cores you just pay less. Actually that's probably why they don't do it...

efreak 1 day ago||

Don't most big clouds not share cores between tenants? I have a vague feeling that around spectre/meltdown this was stopped. I wouldn't be surprised to be wrong, but if you're dedicating a core to a VM, you're not going to charge less for unused CPU that nobody else can use.

arjie 2 days ago||

Realistically, it's the speed with which you can expand and contract. The cloud gives unbounded flexibility - not on the per-request scale or whatever, but on the per-project scale. To try things out with a bunch of EC2s or GCEs is cheap. You have it for a while and then you let it go. I say this as someone with terabytes of RAM in servers, and a cabinet I have in the Bay Area.

ex-aws-dude 1 day ago|

I can see how this would work fine if the primary purpose is for training rather than serving large volumes of customer traffic in multiple regions

It would probably even make sense for some companies to still use cloud for their API but do the training on prem as that may be the expensive part.

More comments...