Don't rent the cloud, own instead

Posted by Torq_boi 2 days ago

Don't rent the cloud, own instead(blog.comma.ai)

1177 points | 491 commentspage 4

kavalg 2 days ago|

This was one of the coolest job ads that I've ever read :). Congrats for what you have done with your infrastructure, team and product!

HanClinto 1 day ago|

Agreed!

Gives a whole new level to the idea of "full stack developer"

siliconc0w 1 day ago||

You can also buy the hardware and hire an IT vendor to rack and help manage it as smart hands so you never need to visit the datacenter. With modern beefy hardware, even large web services only need a few racks so most orgs don't even to manage a large footprint.

Sure you have to schedule your own hardware repairs or updates but it also means you don't need to wrangle with the ridiculous cost-engineering, reserved instances, cloud product support issues or API deprecations, proprietary configuration languages, etc.

Bare metal is better for a lot of non-cost reasons too, as the article notes it's just easier/better to reason about the lower level primitives and you get more reliable and repeatable performance.

j45 1 day ago|

That’s called managed servers or managed services.

I have run bare metal and manage services you just have to be clear on what you have coverage for when disaster strikes or be willing to proactively replace hard drives before they die.

danpalmer 2 days ago||

> Cloud companies generally make onboarding very easy, and offboarding very difficult.

I reckon most on-prem deployments have significantly worse offboarding than the cloud providers. As a cloud provider you can win business by having something for offboarding, but internally you'd never get buy-in to spend on a backup plan if you decide to move to the cloud.

lelanthran 1 day ago|

> As a cloud provider you can win business by having something for offboarding, but internally you'd never get buy-in to spend on a backup plan if you decide to move to the cloud.

Its the other way around. How do you think all businesses moved to the cloud in the first place?

danpalmer 1 day ago||

My point is that at the point of moving, or creating a new deployment, it's perfectly reasonable to say "how do we get off the cloud if it goes badly", yet no one says "how do we get onto a cloud if managing a datacenter sucks".

The cloud providers win business with at least some hint of offboarding support, but on-prem doesn't have that same incentive.

SomaticPirate 1 day ago||

This makes sense for HPC and ML workloads. Big batch jobs where you are pushing the hardware and having everything local is a clear advantage. Also this company sells hardware so it makes sense for them to have hardware experience. Still think that for the majority on here, needing to make a physical phone call to their data center team (!!) to rack a server is a nutty proposition. You think the AWS api is slow? Trying calling Steve. If you have fixed compute costs after a year, sure, look at pulling some stuff on prem.

bob1029 1 day ago||

The #1 reason I would advocate for using AWS today is the compliance package they bring to the party. No other cloud provider has anything remotely like Artifact. I can pull Amazon's PCI-DSS compliance documentation using an API call. If you have a heavily regulated business (or work with customers who do), AWS is hard to beat.

If you don't have any kind of serious compliance requirement, using Amazon is probably not ideal. I would say that Azure AD is ok too if you have to do Microsoft stuff, but I'd never host an actual VM on that cloud.

Compliance and "Microsoft stuff" covers a lot of real world businesses. Going on prem should only be done if it's actually going to make your life easier. If you have to replicate all of Azure AD or Route53, it might be better to just use the cloud offerings.

wiether 1 day ago|

> The #1 reason I would advocate for using AWS today is the compliance package they bring to the party.

I was going to post the same comment.

Most of the people agreeing to foot the AWS bill do it because they see how much the compliance is worth to them.

ynac 1 day ago||

Not nearly on the article's level, but I've been operating what I call a fog machine (itsy bitsy personal cloud) for about 15 years. It's just a bunch of local and off-site NAS boxes. It has kinda worked out great. Mostly Synology, but probably won't be when their scheduled retirement comes up. The networking is dead simple, the power use is distributed, and the size of it all is still a monster for me - back in the day, I had to use it for a very large audio project to keep backups of something like 750,000 albums and other audio recordings along with their metadata and assets.

Dormeno 1 day ago||

The company I work for used to have a hybrid where 95% was on-prem, but became closer to 90% in the cloud when it became more expensive to do on-prem because of VMware licensing. There are alternatives to VMware, but not officially supported with our hardware configuration, so the switch requires changing all the hardware, which still drives it higher than the cloud. Almost everything we have is cloud agnostic, and for anything that requires resilience, it sits in two different providers.

Now the company is looking at doing further cost savings as the buildings rented for running on-prem are sitting mostly unused, but also the prices of buildings have gone up in recent years, notably too, so we're likely to be saving money moving into the cloud. This is likely to make the cloud transition permanent.

comrade1234 2 days ago||

15-years ago or so a spreadsheet was floating around where you could enter server costs, compute power, etc and it would tell you when you would break-even by buying instead of going with AWS. I think it was leaked from Amazon because it was always three-years to break-even even as hardware changed over time.

TonyStr 2 days ago||

Azure provides their own "Total Cost of Ownership" calculator for this purpose [0]. Notably, this makes you estimate peripheral costs such as cost of having a server administrator, electricity, etc.

[0] - https://azure-int.microsoft.com/en-us/pricing/tco/calculator...

Symbiote 2 days ago||

I plugged in our own numbers (60 servers we own in a data centre we rent) and Microsoft thinks this costs us an order of magnitude more than it does.

Their "assumption" for hardware purchase prices seems way off compared to what we buy from Dell or HP.

It's interesting that the "IT labour" cost they estimate is $140k for DIY, and $120k for Azure.

Their saving is 5 times more than what we spend...

TonyStr 1 day ago||

Thank you, I've wanted to see someone use this in the real world. When doing Azure certifications (AZ900, AZ204, etc.), they force you to learn about this tool.

Symbiote 1 day ago||

I may be out of date with RAM prices. Dell's configuration tool wants £1000 each for 32GB RDIMMs — but prices in Dell's configuration tool are always significantly higher than we get if we write to their sales person.

Even so, a rough configuration for a 2-processor 16 core/processor server with 256GiB RAM comes to $20k, vs $22k + 100% = $44k quoted by MS. (The 100% is MS' 20%-per-year "maintenance cost" that they add on to the estimate. In reality this is 0% as everything is under Dell's warranty.)

And most importantly, the tool is only comparing the cost of Azure to constructing and maintaining a data centre! Unless there are other requirements (which would probably rule out Azure anyway) that's daft, a realistic comparison should be to colocation or hired dedicated servers, depending on the scale.

Onavo 2 days ago|||

Well, somebody should recreate it. I smell a potential startup idea somewhere. There's a ton of "cloud cost optimizers" software but most involve tweaking AWS knobs and taking a cut of the savings. A startup that could offload non critical service from AWS to colo and traditional bare metal hosting like Hetzner has a strong future.

One thing to keep in mind is that the curve for GPU depreciation (in the last 5 years at least) is a little steeper than 3 years. Current estimates is that the capital depreciation cost would plunge dramatically around the third year. For a top tier H100 depreciation kicks in around the 3rd year but they mentioned for the less capable ones like the A100 the depreciation is even worse.

https://www.silicondata.com/use-cases/h100-gpu-depreciation/

Now this is not factoring cost of labour. Labor at SF wages is dreadfully expensive, now if your data center is right across the border in Tijuana on the other hand..

vidarh 2 days ago|||

If you buy, maybe. Leasing or renting tends to be cheaper from day one. Tack on migration costs and ca. 6 months is a more realistic target. If the spreadsheet always said 3 years, it sounds like an intentional "leak".

g-b-r 2 days ago||

Did the AWS part include the egress costs to extract your data from AWS, if you ever want to leave them?

coreylane 1 day ago||

AWS says they will waive all egress costs when exiting https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-i...

direwolf20 1 day ago||

Because the EU forced them to

durakot 2 days ago||

There's the HN I know and love

alecco 1 day ago|

Counterpoint: "Why I'm Selling All My GPUs" https://www.youtube.com/watch?v=C6mu2QRVNSE

TL;DW: GPU rental arbitrage is dead. Regulation hell. GPU prices. Rental price erosion. Building costs rising. Complexity of things like backup power. Delays of connection to energy grid. Staffing costs.

More comments...