Bucketsquatting is (finally) dead

Posted by boyter 12 hours ago

Bucketsquatting is (finally) dead(onecloudplease.com)

277 points | 149 commentspage 2

bulbar 9 hours ago|

A name shouldn't be the same as the thing it names.

When a name becomes free and somebody else uses it, it points to another thing. What that means for consumers of the name depends on the context, most likely it means not to use it. If you yourself reassign the name you can decide that the new thing will be considered to be identical to the old thing.

PunchyHamster 8 hours ago||

decision to make bucket (and not bucket + account id surrogate) a sole key for access was one of most annoying mistakes in S3 design

alemwjsl 11 hours ago||

I take it advertising your account id isn't a security risk?

otterley 7 hours ago||

AWS does not consider it one.

“While account IDs, like any identifying information, should be used and shared carefully, they are not considered secret, sensitive, or confidential information.” https://docs.aws.amazon.com/accounts/latest/reference/manage...

Cthulhu_ 10 hours ago|||

Armchair opinion, but shouldn't be too bad - it's identification, not authentication, just like your e-mail address is.

But probably best to not advertise it too much.

thenickdude 6 hours ago|||

If you ever produce and share a signed link for e.g. S3, this link contains your access key ID in it. Turns out you can just slice and decode your Account ID out of that access key, it's in there in base32:

https://medium.com/@TalBeerySec/a-short-note-on-aws-key-id-f...

aduwah 11 hours ago|||

It is not hygienic, but with only the account-id you are fine. In the IAM rules the attacker can always just use a * on their end, so it does not make a difference. You have to be conscious to set proper rules for your (owner) end tho.

arjie 4 hours ago||

Good solution. Thanks for popularizing it.

* Backwards compatible

* Keeps readability

* Solves problem

CafeRacer 8 hours ago||

While I understand where it's coming from I always had something like <bucket_tag>-<9_random_\d\w>

calmworm 11 hours ago||

That took a decade to resolve? Surprising, but hindsight is 20/20 I guess.

icedchai 7 hours ago|

Two. S3 has been around since 2006!

Bridged7756 6 hours ago||

I think I'm not getting it. What's the problem if someone else can claim that bucket name? If it's deleted wouldn't the data be deleted too? Or is it there something I'm missing.

echoangle 6 hours ago||

I think you can put malicious data in the bucket and „impersonate“ the deleted bucket, so old code referencing the bucket uses your data instead of throwing an error (?).

returningfory2 5 hours ago||

Or old code referencing the bucket _writes_ data to it, and the attacker can now read it.

tekla 3 hours ago||

https://www.aquasec.com/blog/bucket-monopoly-breaching-aws-a...

thih9 11 hours ago||

> If you wish to protect your existing buckets, you’ll need to create new buckets with the namespace pattern and migrate your data to those buckets.

My pet conspiracy theory: this article was written by bucket squatters who want to claim old bucket names after AI agents read this and blindly follow.

peanut-walrus 6 hours ago||

Why the hell is this a name suffix instead of just using subdomains?

myapp-123456789012-us-west-2-an

vs myapp.123456789012.us-west-2.s3.amazonaws.com

The manipulations I will need to do to fit into the 63 char limit will be atrocious.

cyberax 3 hours ago|

I would guess that it can add one more DNS lookup?

INTPenis 11 hours ago|

I started treating long random bucketnames as secrets years ago. Ever since I noticed hackers were discovering buckets online with secrets and healthcare info.

This is where IaC shines.

8organicbits 9 hours ago||

~As far as I know, bucket names are public via certificate transparency logs.~ There are tools for collecting those names. Besides you'd leak the subdomain to (typically) unencrypted DNS when you do a lookup and maybe via SNI.

Edit: crossout incorrect info

BCM43 9 hours ago|||

I'm pretty sure buckets use star certs and thus the individual bucket names won't be in the transparency logs.

8organicbits 8 hours ago||

Ah you're right, they are always wildcard certs. I think I was mis-remembering https://news.ycombinator.com/item?id=15826906, which guesses names based on CT logs.

In either case, the subdomain you use in DNS requests are not private. Attackers can collect those from passive DNS logs or in other ways.

embedding-shape 8 hours ago|||

> Besides you'd leak the subdomain to (typically) unencrypted DNS when you do a lookup and maybe via SNI.

"Leak" is maybe a bit over-exaggerated, although if someone MitM'd you they definitely be able to see it. But "leak" makes it seem like it's broadcasted somehow, which obviously it isn't.

8organicbits 8 hours ago||

No man-in-the-middle is needed, DNS queries are often collected into large datasets which can be analyzed by threat hunters or attackers. Check out passive DNS https://www.spamhaus.com/resource-center/what-is-passive-dns...

You'd need to check the privacy policy of your DNS provider to know if they share the data with anyone else. I've commonly seen source IP address consider as PII, but not the content of the query. Cloudflare's DNS, for example, shares queries with APNIC for research purposes. https://developers.cloudflare.com/1.1.1.1/privacy/public-dns... Other providers share much more broadly.

embedding-shape 8 hours ago||

> No man-in-the-middle is needed [...] Check out passive DNS

How does one execute this "passive DNS" without quite literally being on the receiving end, or at least sitting in-between the sending and receiving end? You're quite literally describing what I'm saying, which makes it less of a "leak" and more like "others might collect your data, even your ISP", which I'd say would be accurate than "your DNS leaks".

8organicbits 7 hours ago||

There's a lot of online documentation about passive DNS. Here's one example

> Passive DNS is a historical database of how domains have resolved to IP addresses over time, collected from recursive DNS servers around the world. It has been an industry-standard tool for more than a decade.

> Spamhaus’ Passive DNS cluster handles more than 200 million DNS records per hour and stores hundreds of billions of records per month, providing you with access to a vast lake of threat intelligence data.

https://www.spamhaus.com/resource-center/what-is-passive-dns...

embedding-shape 7 hours ago|||

> collected from recursive DNS servers around the world

Yes, of course, because those DNS servers are literally receiving the queries, eg "receiving the data".

Again, there is nothing "leaking" here, that's like saying you leak what HTTP path you're requesting to a server, when you're sending a HTTP request to that server. Of course, that's how the protocol works!

8organicbits 6 hours ago||

I think you are hung up on the word "leak".

Putting a secret subdomain in a DNS query shares it with the recursive resolver, who's privacy policy may permit them to share it with others. This is a common practice and attackers have access to the aggregated datasets. You are correct that third-party web servers or CDN could share your HTTP path, but I am not aware of any examples and most privacy policies should prohibit them from doing so. If your web server provider or CDN do this, change providers. DNS recursive resolvers are chosen client side, so you can't always choose which one handles the query. Even privacy-focused DNS recursive resolvers share anonymized query data. They remove the source IP address, since it's PII, but still "leak" the secret subdomain.

Any time you send secret data such that it travels to an attacker visible dataset it is vulnerable to attack. I call that a leak but we can use a different term.

embedding-shape 5 hours ago||

> I think you are hung up on the word "leak".

What gave you that idea? Maybe because my initial comment started with:

> "Leak" is maybe a bit over-exaggerated...

And continues with about why I think so?

I raised this sub-thread specifically because I got hung up on "leak", that's entire point of the conversation in my mind.

NetMageSCW 4 hours ago|||

So nothing to do with your DNS queries at all? Why did you bring it up?

XorNot 11 hours ago|||

I just started using hashes for names. The deployment tooling knows the "real" name. The actual deployment hash registers a salt+hash of that name to produce a pseudo-random string name.

Galanwe 11 hours ago||

This is all good and we'll on the IaC side,yes. But at the end of the day, buckets are also user facing resources, and nobody likes random directory / bucket names.

INTPenis 9 hours ago|||

That's a contradiction, a bucket name being treated as a secret in IaC, while being a user facing resource. So no, they're not user facing resources.

If anyone wants them to be user facing resources, then treat them as such, and ensure they're secure, and don't store sensitive info on them. Otherwise, put a service infront of them, and have the user go through it.

The S3 protocol was meant to make the lives of programmers easier, not end users.

amluto 10 hours ago|||

It would be nice if the other end of this could be addressed: a configurable policy to limit resolution of bucket names within an account namespace. Ideally, if someone doesn’t have permission to resolve a bucket name, they shouldn’t even be able to detect whether it exists.

More comments...