We saved $500k per year by rolling our own "S3"

Posted by mpweiher 3 days ago

We saved $500k per year by rolling our own "S3"(engineering.nanit.com)

322 points | 250 commentspage 4

another_twist 2 days ago|

I mean the "S3" could be replaced with object storage. I guess thats the technical term anyway. Having said that just goes to show how cheap S3 is, if after all of this, the savings are just $500k. Definitely money saved but not a lot.

fergie 2 days ago||

I mean fair enough, but I feel like S3 is one of the few AWS products that is actually pretty cheap for what you get.

TimCock 2 days ago||

[flagged]

ch2026 2 days ago||

[flagged]

OsrsNeedsf2P 2 days ago|

It's the government who lost 850TB of citizen data with no backups[0] Because Cloud bad.

[0] https://www.techradar.com/pro/security/the-south-korean-gove...

codedokode 2 days ago|||

Storing the data in a foreign cloud would allow foreign nation to play funny tricks on the country. What they need is not the cloud but sane backup system.

PartiallyTyped 2 days ago|||

Isolated partitions exist.

bpodgursky 2 days ago|||

ap-northeast-2 is literally in Seoul

sekh60 2 days ago||

CLOUD Act.

senectus1 2 days ago|||

because they didnt have a decent backup.

Lucian6 2 days ago|

[flagged]

supriyo-biswas 2 days ago||

Why do all your comments seem LLM generated? You do clearly have something to contribute, but it’s probably better to just write what you’re talking about than going through a LLM.

deaux 2 days ago|||

They do not have anything to contribute. It's all made up.

> Having worked extensively with battery systems, I think the grid storage potential of second-life EV batteries is more complex than it appears

> Having worked extensively with computer vision models for our interview analysis system,

> Having dealt with similar high-stakes outages in travel tech, the root cause here seems to be their dependency

> Having gone through S3 cost optimization ourselves,

> The surgical metaphor really resonates with my experience building real-time interview analysis systems

The sad news is that very soon it will be slightly less obvious and then when I call them out just like now I'll be slapped by dang et. al with such accusations being against the HN guidelines. Right now most, like this one, don't care enough so it's still beyond doubt to an extent where that doesn't happen.

Unfortunately they're clearly already good enough to fool you and many here.

bee_rider 2 days ago|||

Having screwed around extensively on Internet forums, it’s always be pretty difficult to figure out who’s just a phony.

But yeah this won’t make it any easier.

supriyo-biswas 2 days ago|||

> I'll be slapped by dang et. al with such accusations being against the HN guidelines

This is also the reason I toned it down a bit, although I've never received a formal reprimand from dang he's often dropped by my threads containing such callouts when the original poster of the LLM comment disagreed with my assessment.

pjjpo 2 days ago||||

I don't know about the commenter specifically but in general, using LLMs to format text is a game changer in the ability for English-as-Second-Language folks to contribute to tech conversations. While I get where some of the bias against anything LLM generated comes from, I would keep it for editorial content and not community comments to be fair to a global audience.

ashdksnndck 2 days ago|||

I’m worried that LLMs could facilitate cheap, scaled astroturfing.

I understand that people encounter discrimination based on English skill, and it makes sense that people will use LLMs to help with that, especially in a professional context. On the other hand, I’d instinctively be more trusting of the authenticity of a comment with some language errors than one that reads like it was generated by ChatGPT.

barrell 2 days ago||||

I’m not sure if that’s a realistic ask. There is ample abuse of LLM generated content, and there are plenty of ESL publishers.

Personally I would recommend including a note that English is not your native language and you had an LLM clean things up. I think people are willing to give quite a bit of grace, if it’s disclosed.

Personally, I’d rather see a response in your native language with a translation, but I’m fairly certain I’m the odd one out in that situation XD

nondrool 2 days ago||

You're not alone.

lbreakjai 2 days ago||||

I tried that, but you end up sounding so bland and generic. It feels like the textual equivalent of the Corporate Memphis art style. I'm comfortable doing this at work because I exist outside of slack/emails, but in here I am what I write. If I delegate this to a LLM, then I do not exist anymore.

What I found useful is to use LLMs as a fuzzy near-synonym search engine. "Sad, but with a note of nostalgia", for example. It's a slower process, which in itself isn't bad.

phito 2 days ago||||

It just makes everything sound bland and soulless. You don't know which part of the message actually comes from the user's brain and which part has been added/suggested by the LLM. The latter is not an original thought and it would be disingenuous to include it, but people do because it makes them look smarter. Meanwhile, on the other side, you might as well be talking to a LLM...

deaux 2 days ago|||

This commenter is making everything up and a 3 second look at their profile puts this beyond any doubt. Regardless, the benefit of the doubt should no longer be given. Too bad for my fellow ESLers, I'm one myself, but we better get just writing in English. It's already a daily occurrence now to see these bots on HN.

LPisGood 2 days ago||||

What do you see about this comment that seems particularly LLM generated?

bryanrasmussen 2 days ago|||

I wondered myself, as it seemed ok, but I went through the poster's history as I was interested.

Firstly, they have a remarkably consistent style. Everything is like this. There's not very many examples to choose from, so that's maybe also to be expected, and perhaps it is just also their personality.

I worry, as I've been accused myself, that there is perhaps something in the style the accuser dislikes or finds off-putting and nowadays the suspected cause will be LLM.

Secondly, they have "extensive experience" in various areas of technology, that don't seem to be especially related to each other. I too have extensive experience in several areas of technology but there is something of a connector between them.

Perhaps it is just because of their high level of technical expertise that they have managed to move between these areas and gain this extensive experience. And because of the high level of technical expertise and their interest in only saying very technical things all the time, their communications seem less varying and human, and more LLM.

BoredPositron 2 days ago||

It's the verbose writing style. I can see why you would be accused as well.

deaux 2 days ago|||

FWIW the people accusing the person you're replying to would be clearly wrong as this sentence alone directly rules it out being straight LLM output:

> and nowadays the suspected cause will be LLM.

It's very unlike the original person, who is a bot indeed.

BoredPositron 2 days ago||

I know. The problem is that extreme verbose writing styles get associated with LLMs it's in the same vain as em-dashes.

bryanrasmussen 2 days ago|||

going to my last page of comments at this time

https://news.ycombinator.com/threads?id=bryanrasmussen

I have 4 comments of more than 3 sentences and 3 comments of 2 or 3 sentences and 5 comments of 1 sentence.

The sentences were generally pretty short.

BoredPositron 2 days ago||

Verbosity isn't just about the length of your comments. It's about using more words than necessary. Sometimes a 'yes' is enough instead of two sentences. It just seems that you like to express your thought process in words. It's not a critique on your writing style it's just a trait that your writing sharea with LLMs.

deaux 2 days ago||||

LLMs are incredibly prone towards producing examples and reasons in groups of 3, in an A, B, C pattern. The comment in question does so almost every paragraph.

> We found that implementing proper data durability (3+ replicas, corruption detection, automatic repair)

> The engineering time spent building and maintaining custom tooling for multi-region replication, access controls, and monitoring ended

And so on. On top of this a 5 second look at the profile confirms that it's a bot.

They're using a very structured and detailed prompt. The upside of that for them is that their comment looks much more "HN-natural" than 99% of LLM comments on here. The downside is that their comments look even much more similar to each other than other bots, which display more variety. That's the tradeoff they're making. Other bots' comments are much more obviously sloppy, but there's more variety across their different comments.

LPisGood 2 days ago||

When I’m giving examples I also aim to give three if at all practical. Language generally flows more naturally that way.

glitchcrab 2 days ago|||

It just has a certain feel to it, by the end of the first paragraph I also thought it was written by an LLM too.

rendaw 2 days ago||||

> For high-throughput workloads (>500 req/s), we actually saw better cost efficiency with S3 due to their economies of scale on bandwidth. The breakeven point seems to be around 100-200TB of relatively static data with predictable access patterns. Below that, the operational overhead of running your own storage likely exceeds S3's markup.

I just spent 5 minutes reading this over and over, but it still doesn't make any sense to me. First it says that high throughput = s3, low throughput = self hosted. Then it says low throughput = s3, (therefore high throughput = self hosted).

YZF 2 days ago|||

Right. Having worked on a commercial S3 compatible storage I can tell y'all that there's a lot more to it then just sticking some files on JBOD. It does depend on your specific requirements though. 1.5 FTE over 18 months sounds on the low side for everything you've described.

That said the article seems to be more about an optimization of their pipeline to reduce the S3 usage by holding some objects in memory instead. That's very different than trying to build your own object store to replace S3.

john01dav 2 days ago|||

There are more options than using S3 or completely rolling your own on JBOD. For example, you could use a cheaper S3-compatible cloud (such as Backblaze) or you can deploy a project such as Ceph.

Twirrim 2 days ago|||

S3 does more than 3x replica durability, as well, they use a form of erasure coding. They can lose several hard drives/servers/racks before your data becomes at risk, and have sufficient spare capacity to very quickly reproduce any missing shards before things become a problem.

That said, S3 seems like a really odd fit for their workload, plus their dependency on lifecycle rules seems utterly bizarre.

> Storage was a secondary tax. Even when processing finished in ~2 s, Lifecycle deletes meant paying for ~24 h of storage.

They decided not to implement the deletion logic in their service, so they'd just leave files sitting around for hours instead needlessly paying that storage cost? I wonder how much money they'd have saved if they just added that deletion logic.

groundzeros2015 2 days ago||

Is spending time to optimize S3 in the manner you describe not a relevant cost?