Top
Best
New

Posted by mpweiher 10/26/2025

We saved $500k per year by rolling our own "S3"(engineering.nanit.com)
328 points | 251 commentspage 2
none2585 10/27/2025|
I'm curious how many engineers per year this costs to maintain
CaptainOfCoit 10/27/2025||
> I'm curious how many engineers per year this costs to maintain

The end of the article has this:

> Consider custom infrastructure when you have both: sufficient scale for meaningful cost savings, and specific constraints that enable a simple solution. The engineering effort to build and maintain your system must be less than the infrastructure costs it eliminates. In our case, specific requirements (ephemeral storage, loss tolerance, S3 fallback) let us build something simple enough that maintenance costs stay low. Without both factors, stick with managed services.

Seems they were well aware of the tradeoffs.

codedokode 10/27/2025|||
And I am curious how many engineer years it requires to port code to cloud services and deal with multiple issues you cannot even debug due to not having root privileges in the cloud.

Without cloud, saving a file is as simple as "with open(...) as f: f.write(data)" + adding a record to DB. And no weird network issues to debug.

rajamaka 10/27/2025|||
> as simple as "with open(...) as f: f.write(data)"

Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...

Without on-prem, saving a file is as simple as s3.put_object() !

AdieuToLogic 10/27/2025|||
>> Without cloud, saving a file is as simple as "with open(...) as f: f.write(data)" + adding a record to DB.

> Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...

Most of these concerns can be addressed with ZFS[0] provided by FreeBSD systems hosted in triple-A data centers.

See also iSCSI[1].

0 - https://docs.freebsd.org/en/books/handbook/zfs/

1 - https://en.wikipedia.org/wiki/ISCSI

SXX 10/27/2025||
Except running ZFS on FreeBSD would certainly require dedicated devops person with very specific skillset that majority of people on market dont have.
Rohansi 10/27/2025||||
I don't think any of those mattered for their use case. That's why they didn't actually need S3.
codedokode 10/27/2025||||
With s3, you cannot use ls, grep and other tools.

> Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...

Wow that's a lot to learn before using s3... I wonder how much it costs in salaries.

> With what network topology?

You don't need to care about this when using SSDs/HDDs.

> With what access policies?

Whichever is defined in your code, no restrictions unlike in S3. No need to study complicated AWS documentation and navigate through multiple consoles (this also costs you salaries by the way). No risk of leaking files due to misconfigured cloud services.

> With what backup strategy?

Automatically backed up with rest of your server data, no need to spend time on this.

rajamaka 10/27/2025|||
> You don't need to care about this when using SSDs/HDDs.

You do need to care when you move beyond a single server in a closet that runs your database, webserver and storage.

> No risk of leaking files due to misconfigured cloud services.

One misconfigured .htaccess file for example, could result in leaking files.

codedokode 10/27/2025|||
> One misconfigured .htaccess

First, I hope nobody is using Apache anymore, second, you typically store files outside of web directory.

pestaa 10/27/2025||
Why nobody should use Apache? I rediscovered it to be great in many use cases. And there's llms to help with the config syntax.
codedokode 10/27/2025||
Performance not great compared to nginx.
AdieuToLogic 10/27/2025||||
>> No risk of leaking files due to misconfigured cloud services.

> One misconfigured .htaccess file for example, could result in leaking files.

I don't think you are making a compelling case here, since both scenarios result in an undesirable exposure. Unless your point is both cloud services and local file systems can be equally exploited?

Nextgrid 10/27/2025|||
With bare-metal machines you can go very far before needing to scale beyond one machine.
inlined 10/27/2025||||
It sounds like you’re not at the scale where cloud storage is obviously useful. By the time you definitely need S3/GCS you have problems making sure files are accessible everywhere. “Grep” is a ludicrous proposition against large blob stores
coderintherye 10/27/2025|||
I mean you can easily mount the S3 bucket to the local filesystem (e.g. using s3fs-fuse) and then use standard command line tools such as ls and grep.
hallman76 10/27/2025|||
I inherited an S3 bucket where hundreds of thousands of files were written to the bucket root. Every filename was just a uuid. ls might work after waiting to page though to get every file. To grep you would need to download 5 TB.
codedokode 10/27/2025|||
It's probably going to be dog slow. I dealt with HDDs where just iterating through all files and directories takes hours, and network storage is going to be even slower at this scale.
bcrosby95 10/27/2025|||
You can't ever definitively answer most of those questions on someone else's cloud. You just take Amazons word for whatever number of nines they claim it has.
rajamaka 10/27/2025||
Not needing to ask the questions is the selling point.
grebc 10/27/2025|||
Bro were you off grid last week. Your questions equally apply to AWS, you just magically handwave away all those questions as if AWS/GCP/Azure outages aren’t a thing.
patrick451 10/27/2025|||
Until it goes down because because aws STILL hasn't made themselves completely multi-region or can't figure our their DNS.
beoberha 10/27/2025||||
A lot of reductive anti-cloud stuff gets posted here, but this might be the granddaddy of them all.
mjr00 10/27/2025||||
> Without cloud, saving a file is as simple as "with open(...) as f: f.write(data)" + adding a record to DB. And no weird network issues to debug.

There may be some additional features that S3 has over a direct filesystem write to a SSD in your closet. The people paying for cloud spend are paying for those features.

RedShift1 10/27/2025||||
Ah that is where logging and traceability comes in! But not to worry, the cloud has excellent tools for that! The fact that logging and tracing will become half your cloud cost, oh well let's just sweep that under the rug.
hinkley 10/27/2025|||
Variation on an old classic.

Question: How do you save a small fortune in cloud savings?

Answer: First start with a large fortune.

nbngeorcjhe 10/27/2025|||
A small fraction of 1, probably? It sounds like a fairly simple service that shouldn't require much ongoing development
codedokode 10/27/2025|||
Especially if you have access to LLMs.
hinkley 10/27/2025|||
You're going to run a production system with a bus number of 1?

I think you mean a small fraction of 3 engineers. And small fractions aren't that small.

adrianN 10/27/2025|||
So far I have seen a lot more production systems with a bus factor of zero than production systems with a bus factor greater one.
xboxnolifes 10/27/2025|||
The cost being a fraction of 1 does not imply it's one person. 3 people each spending 2 weeks a year on the service is still a fraction of 1.
hinkley 10/27/2025||
It is three opportunity costs. No free lunches.
Dylan16807 10/27/2025||
Nobody implied it was free. Yes there are opportunity costs, and they add up to less than one sysadmin of opportunity.
UseofWeapons1 10/27/2025|||
Yes, that was my thought as well. Breakeven might be like 1 (give or take 2x)?
hinkley 10/27/2025||
Anything worth doing needs three people. Even if they also are used for other things.
codedokode 10/27/2025|||
What I notice, that large companies use their own private cloud and datacenters. At their scale, it is cheaper to have their own storage. As a side business, they also sell cloud services themselves. And small companies probably don't have that much data to justify paying for a cloud instead of buying several SSDs/HDDs or creating SMB share on their Windows server.
kingnothing 10/27/2025||
Nanit is horrible spyware. Do not buy their products.

If you have a router that lets you inspect data flowing out, you'll be astonished at what your little Nanit cam exfiltrates from your home network. Even if you don't pay for their subscription service, they still attempt to exfil all of the video footage caught on your camera to their servers. You can block it and it will still work, but you shouldn't have to do that in the first place if you don't pay for their cloud service.

Stay away if you value your privacy.

DevelopingElk 10/27/2025||
Video processing is one of those things that need caution when doing serverlessly. This solution makes sense, especially because S3s durability guarantees aren't needed.
dxxvi 10/27/2025||
So, you want a place to store many files in a short period of time and when there's a new file, somebody must be notified?

Have you ever thought of using a postgresql db (also on aws) to store those files and use CDC to publish messages about those files to a kafka topic? In your original way, we need 3 aws services: s3, lambda and sqs. With this way, we need 2: postgresql and kafka. I'm not sure how well this method works though :-)

ravedave5 10/27/2025||
I've dealt with images in a database and it was a disaster, the transfer times are garbage.
jrochkind1 10/27/2025||
Like put the video blobs themselves in postgres data columns? Does putting very large (relative to what you normally put in postgres) files in pg work well? Genuine question, i do not know, I've been considering it too and hesitant about it.
dxxvi 10/27/2025|||
I already checked with AI before putting the comment :-)

1GB with the bytea data type (https://www.postgresql.org/docs/current/datatype-binary.html) and 4TB with the BLOB data type (https://wiki.postgresql.org/wiki/BinaryFilesInDB).

jrochkind1 10/28/2025||
Have you done this? I can google or AI for the max size that postgres will allow, sure. I have googled in the past for whether this actually works well, and have gotten answers leaning towards most advice against it in real world scenarios.

So if you have experience with this and it did work well, I'm curious to hear about it! That's why i asked about if it worked well, not about the maximum size postgres allowed in various data types.

If you have no experience with it, but are just posting advice based on what AI tells you about max sizes of data allowed by pg that I can get from the same source too, then okay, fair enough, and certainly no need to give me any more of that!

dxxvi 10/27/2025|||
> I've been considering it too and hesitant about it

Why hesitant? Just ask AI. It'll tell you how to do it and then you can experiment it yourself.

gethly 10/27/2025||
I made my own S3 as well. I used two S3-compatible services before but there was always some issue(first one failed to upload certain file, no matter what and support was unhelpful; second one did not migrate with file metadata properly so i knew this would be ongoing problem). In the end, it is just a dumb file store, nothing else. All you need to do is to write a basic HTTPS API layer and some logic to handle database for the file metadata and possibly location. That is about it. Takes a few days with testing.

But then you also have to think about file uploads and file downloads. You cannot have a single server fulfilling all the roles, otherwise you have a bottleneck.

So this file storage became a private backend service that end-users never access directly. I have added upload services, whose sole purpose is to allow users to upload files and only then upload them to this central file store, essentially creating a distributed file upload queue(there is also a bit more logic regarding file id creation and validation).

Secondly, own CDN was needed for downloads. But only because I use custom access handling and could not have used any of the commercial services(though they do support access via tokens, it just was not working for me). This was tricky because I wanted for the nodes to distribute files between themselves and not always fetch them from the origin to avoid network costs on the origin server. So they had to find each other, talk to each other and know who has which file.

In short, rolling your own is not as hard as it might seem and should be preferable. Maybe to save time, use cloud at the beginning, but once you are up and running and your business idea is validated by having customer, immediately move to your own infra in order to avoid astronomical costs of cloud services.

btw, i also do video processing like mentioned in the blog post :)

gnarlouse 10/27/2025||
because "How we stopped putting your kids in S3 buckets"

just sounded less attractive

szszrk 10/27/2025|
They can't say that, as they did not stop. They made a cache in front of it.
Huxley1 10/27/2025||
S3 certainly saves a lot of hassle, but in certain use cases, it really is prohibitively expensive. Has anyone tried self-hosted alternatives like MinIO or SeaweedFS? Or taken even more radical approaches? How do you balance between stability, maintenance overhead, and cost savings?
ddxv 10/27/2025|
MinIO has moved away from having a free community fork, and I think it's base cost is close to $100k a year. I've been using Garage and been happy, but as a single dev and orders of magnitude smaller than the OP, so there are certainly edge cases I'm missing to compare the two.
Cerium 10/27/2025||
I'm a fellow new Garage user. I have had a great time so far - but I also don't need much. My use case is to share data analysis results with a small team. I wanted something simple to manage that can provide an s3 like interface to work with off the shelf data analysis tools.
OrangeDelonge 10/27/2025||
Couldn’t they have used S3 express one zone?
elmigranto 10/27/2025||
Classic case of "focus on building your app, not infrastructure". Here's another multi-million dollar idea: put this cache directly inside your own video processing server and upload there.
anshumankmr 10/27/2025|
Some stuff like this also exists: https://www.dell.com/en-in/shop/storage-servers-and-networki...

We could just use something like that

Or there is that other Object storage solution called R1 from Cloudflare.

tauntz 10/27/2025|
* R2
More comments...