Top
Best
New

Posted by todsacerdoti 4/19/2025

The Web Is Broken – Botnet Part 2(jan.wildeboer.net)
411 points | 274 commentspage 2
__MatrixMan__ 4/19/2025|
The broken thing about the web is that in order for data to remain readable, a unique sysadmin somewhere has to keep a server running in the face of an increasingly hostile environment.

If instead we had a content addressed model, we could drop the uniqueness constraint. Then these AI scrapers could be gossiping the data to one another (and incidentally serving it to the rest of us) without placing any burden on the original source.

Having other parties interested in your data should make your life easier (because other parties will host it for you), not harder (because now you need to work extra hard to host it for them).

akoboldfrying 4/20/2025||
Assuming the right incentives can be found to prevent widespread leeching, a distributed content-addressed model indeed solves this problem, but introduces the problem of how to control your own content over time. How do you get rid of a piece of content? How do you modify the content at a given URL?

I know, as far as possible it's a good idea to have content-immutable URLs. But at some point, I need to make www.myexamplebusiness.com show new content. How would that work?

__MatrixMan__ 4/20/2025||
As for how to get rid of a piece of content... I think that one's a lost cause. If the goal is to prevent things that make content unavailable (e.g. AI scrapers) then you end up with a design that prevents things that makes content unavailable (e.g. legitimate deletions). The whole point is that you're not the only one participating in propagating the content, and that comes with trade-offs.

But as for updating, you just format your URLs like so: {my-public-key}/foo/bar

And then you alter the protocol so that the {my-public-key} part resolves to the merkle-root of whatever you most recently published. So people who are interested in your latest content end up with a whole new set of hashes whenever you make an update. In this way, it's not 100% immutable, but the mutable payload stays small (it's just a bunch of hashes) and since it can be verified (presumably there's a signature somewhere) it can be gossiped around and remain available even if your device is not.

You can soft-delete something just by updating whatever pointed to it to not point to it anymore. Eventually most nodes will forget it. But you can't really prevent a node from hanging on to an old copy if they want to. But then again, could you ever do that? Deleting something on on the web has always been a bit of a fiction.

akoboldfrying 4/20/2025||
> But then again, could you ever do that?

True in the absolute sense, but the effect size is much worse under the kind of content-addressable model you're proposing. Currently, if I download something from you and you later delete that thing, I can still keep my downloaded copy; under your model, if anyone ever downloads that thing from you and you later delete that thing, with high probability I can still acquire it at any later point.

As you say, this is by design, and there are cases where this design makes sense. I think it mostly doesn't for what we currently use the web for.

__MatrixMan__ 4/20/2025||
You could only later get the thing if you grabbed its hash while it was still available. And you could only reliably resolve that hash later if somebody (maybe you) went out of their way to pin the underlying data. Otherwise nodes would forget rather quickly, because why bother keep around unreferenced bits?

It's the same functionality you get with permalinks and sites like archive.org--forgotten unless explicitly remembered by anybody, dynamic unless explicitly a permalink. It's just built into the protocol rather than a feature to be inconsistently implemented over and over by many separate parties.

XorNot 4/20/2025|||
Except no one wants content addressed data - because if you knew what it was you wanted, then you would already have stored it. The web as we know it is an index - it's a way to discover that data is available and specifically we usually want the latest data that's available.

AI scrapers aren't trying to find things they already know exist, they're trying to discover what they didn't know existed.

__MatrixMan__ 4/20/2025|||
Yes, for the reasons you describe, you can't be both a useful web-like protocol and also 100% immutable/hash-linked.

But there's a lot middle ground to explore here. Loading a modern web page involves making dozens of requests to a variety of different servers, evaluating some javascript, and then doing it again a few times, potentially moving several Mb of data. The part people want, the thing you don't already know exist, it's hidden behind that rather heavy door. It doesn't have to be that way.

If you already know about one thing (by its cryptographic hash, say) and you want to find out which other hashes it's now associated with--associations that might not have existed yesterday--that's much easier than we've made it. It can be done:

- by moving kB not Mb, we're just talking about a tuple of hashes here, maybe a public key and a signature

- without placing additional burden on whoever authored the first thing, they don't even have to be the ones who published the pair of hashes that your scraper is interested in

Once you have the second hash, you can then reenter immutable-space to get whatever it references. I'm not sure if there's already a protocol for such things, but if not then we can surely make one that's more efficient and durable than what we're doing now.

XorNot 4/20/2025||
But we already have HEAD requests and etags.

It is entirely possible to serve a fully cached response that says "you already have this". The problem is...people don't implement this well.

__MatrixMan__ 4/20/2025||
People don't implement them well because they're overburdened by all of the different expectations we put on them. It's a problem with how DNS forces us to allocate expertise. As it is, you need some kind of write access on the server whose name shows up in the URL if you want to contribute to it. This is how globally unique names create fragility.

If content were handled independently of server names, anyone who cares to distribute metadata for content they care about can do so. One doesn't need write access, or even to be on the same network partition. You could just publish a link between content A and content B because you know their hashes. Assembling all of this can happen in the browser, subject to the user's configs re: who they trust.

akoboldfrying 4/20/2025|||
> because if you knew what it was you wanted, then you would already have stored it.

"Content-addressable" has a broader meaning than what you seem to be thinking of -- roughly speaking, it applies if any function of the data is used as the "address". E.g., git commits are content-addressable by their SHA1 hashes.

__MatrixMan__ 4/20/2025||
But when you do a "git pull" you're not pulling from someplace identified by a hash, but rather a hostname. The learning-about-new-hashes part has to be handled differently.

It's a legit limitation on what content addressing can do, but it's one we can overcome by just not having everything be content addressed. The web we have now is like if you did a `git pull` every time you opened a file.

The web I'm proposing is like how we actually use git--periodically pulling new hashes as a separate action, but spending most of our time browsing content that we already have hashes for.

Timwi 4/20/2025|||
Are there any systems like that, even if experimental?
jevogel 4/20/2025||
IPFS
alakra 4/20/2025||
I had high hopes for IPFS, but even it has vectors for abuse.

See https://arxiv.org/abs/1905.11880 [Hydras and IPFS: A Decentralised Playground for Malware]

__MatrixMan__ 4/20/2025||
Can you point me at what you mean? I'm not immediately finding something that indicates that it is not fit for this use case. The fact that bad actors use it to resist those who want to shut them down is, if anything, an endorsement of its durability. There's a bit of overlap between resisting the AI scrapers and resisting the FBI. You can either have a single point of control and a single point of failure, or you can have neither. If you're after something that's both reliable and reliably censorable--I don't think that's in the cards.

That's not to say that it is a ready replacement for the web as we know it. If you have hash-linked everything then you wind up with problems trying to link things together, for instance. Once two pages exist, you can't after-the-fact create a link between them because if you update them to contain that link then their hashes change so now you have to propagate the new hash to people. This makes it difficult to do things like have a comments section at the bottom of a blog post. So you've got to handle metadata like that in some kind of extra layer--a layer which isn't hash linked and which might be susceptible to all the same problems that our current web is--and then the browser can build the page from immutable pieces, but the assembly itself ends up being dynamic (and likely sensitive to the users preference, e.g. dark mode as a browser thing not a page thing).

But I still think you could move maybe 95% of the data into an immutable hash-linked world (think of these as nodes in a graph), the remaining 5% just being tuples of hashes and pubic keys indicating which pages are trusted by which users, which ought to be linked to which others, which are known to be the inputs and output of various functions, and you know... structure stuff (these are our graph's edges).

The edges, being smaller, might be subject to different constraints than the web as we know it. I wouldn't propose that we go all the way to a blockchain where every device caches every edge, but it might be feasible for my devices to store all of the edges for the 5% of the web I care about, and your devices to store the edges for the 5% that you care about... the nodes only being summoned when we actually want to view them. The edges can be updated when our devices contact other devices (based on trust, like you know that device's owner personally) and ask "hey, what's new?"

I've sort of been freestyling on this idea in isolation, probably there's already some projects that scratch this itch. A while back I made a note to check out https://ceramic.network/ in this capacity, but I haven't gotten down to trying it out yet.

areyourllySorry 4/20/2025|||
there is no incentive for different companies to share data with each other, or with anyone really (facebook leeching books?)
__MatrixMan__ 4/20/2025|||
I figure we'd create that incentive by configuring our devices to only talk to devices controlled by people we trust. If they want the data at all, they have to gain our trust, and if they want that, they have to seed the data. Or you know, whatever else the agreement ends up being. Maybe we make them pay us.
reconnecting 4/19/2025||
Residential IP proxies have some weaknesses. One is that they ofter change IP addresses during a single web session. Second, if IP come from the same proxies provider, they are often concentrated within a sing ASN, making them easier to detect.

We are working on an open‑source fraud prevention platform [1], and detecting fake users coming from residential proxies is one of its use cases.

[1] https://www.github.com/tirrenotechnologies/tirreno

andelink 4/20/2025||
The first blog post in this series[1], linked to at the top of TFA, offers an analysis on the potential of using ASNs to detect such traffic. Their conclusion was that ASNs are not helpful for this use-case, showing that across the 50k IPs they've blocked, there is less than 4 IP addresses per ASN, on average.

[1] https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/

reconnecting 4/20/2025||
What was done manually in the first blog is exactly what tirreno helps to achieve by analyzing traffic, here is live example [1]. Blocking an entire ASN should not be considered a strategy when real users are involved.

Regarding the first post, it's rare to see both datacenter network IPs and mobile proxy IP addresses used simultaneously. This suggests the involvement of more than one botnet. The main idea is to avoid using IP addresses as the sole risk factor. Instead, they should be considered as just one part of the broader picture of user behavior.

[1] https://play.tirreno.com

gruez 4/20/2025|||
>One is that they ofter change IP addresses during a single web session. Second, if IP come from the same proxies provider, they are often concentrated within a sing ASN, making them easier to detect.

Both are pretty easy to mitigate with a geoip database and some smart routing. One "residential proxy" vendor even has session tokens so your source IP doesn't randomly jump between each request.

reconnecting 4/20/2025||
And this is the exact reason why IP addresses cannot be considered as the one and only signal for fraud prevention.
gbcfghhjj 4/19/2025||
At least here in the US most residential ISPs have long leases and change infrequently, weeks or months.

Trying to understand your product, where is it intended to sit in a network? Is it a standalone tool that you use to identify these IPs and feed into something else for blockage or is it intended to be integrated into your existing site or is it supposed to proxy all your web traffic? The reason I ask is it has fairly heavyweight install requirements and Apache and PHP are kind of old school at this point, especially for new projects and companies. It's not what they would commonly be using for their site.

reconnecting 4/19/2025||
Indeed, if it's a real user from a residential IP address, in most cases it will be the same network. However, if it's a proxy from residential IPs, there could be 10 requests from one network, the 11th request from a second network, and the 12th request back from the same network. This is a red flag.

Thank you for your question. tirreno is a standalone app that needs to receive API events from your main web application. It can work perfectly with 512GB Postgres RAM or even lower, however, in most cases we're talking about millions of events that request resources.

It's much easier to write a stable application without dependencies based on mature technologies. tirreno is fairly 'boring software'.

sroussey 4/19/2025||
My phone will be on the home network until I walk out of the house and then it will change networks. This should not be a red flag.
reconnecting 4/20/2025||
Effective fraud prevention relies on both the full user context and the behavioral patterns of known online fraudsters. The key idea is that an IP address cannot be used as a red flag on its own without considering the broader context of the account. However, if we know that the fraudsters we're dealing with are using mobile networks proxies and are randomly switching between two mobile operators, that is certainly a strong risk signal.
JimDabell 4/20/2025||
An awful lot of free Wi-Fi networks you find in malls are operated by different providers. Walking from one side of a mall to the other while my phone connects to all the Wi-Fi networks I’ve used previously would have you flag me as a fraudster if I understand your approach correctly.
reconnecting 4/20/2025||
We are discussing user behavior in the context of a web system. The fact that your device has connected to different Wi-Fi networks doesn't necessarily mean that all of them were used to access the web application.

Finally, as mentioned earlier, there is no silver bullet that works for every type of online fraudster. For example, in some applications, a TOR connection might be considered a red flag. However, if we are talking about hn visitors, many of them use TOR on a daily basis.

sroussey 4/22/2025||
I’ve done a bit of anti-fraud myself and it needs a lack of privacy to work well. Well fingerprinted == less fraud. Sigh.

I’ve found TOR browsing ok, but login via TOR to just be a great alternative to snow shoeing credential stuffing.

Pesthuf 4/19/2025||
We need a list of apps that include these libraries and any malware scanner - including Windows Defender, Play Protect and whatever Apple calls theirs - need to put infected applications into quarantine immediately. Just because it's not directly causing damage to the device running the malware is running on, that doesn't mean it's not malware.
philippta 4/19/2025||
Apps should be required to ask for permission to access specific domains. Similar to the tracking protection, Apple introduced a while ago.

Not sure how this could work for browsers, but the other 99% of apps I have on my phone should work fine with just a single permitted domain.

snackernews 4/20/2025|||
My iPhone occasionally displays an interrupt screen to remind me that my weather app has been accessing my location in the background and to confirm continued access.

It should also do something similar for apps making chatty background requests to domains not specified at app review time. The legitimate use cases for that behaviour are few.

klabb3 4/20/2025||||
On the one hand, yes this could work for many cases. On the other hand, good bye p2p. Not every app is a passive client-server request-response. One needs to be really careful with designing permission systems. Apple has already killed many markets before they had a chance to even exist, such as companion apps for watches and other peripherals.
kmeisthax 4/20/2025|||
P2P was practically dead on iPhone even back in 2010. The whole "don't burn the user's battery" thing precludes mobile phones doing anything with P2P other than leeching off of it. The only exceptions are things like AirDrop; i.e. locally peer-to-peer things that are only active when in use and don't try to form an overlay or mesh network that would require the phone to become a router.

And, AFAIK, you already need special permission for anything other than HTTPS to specific domains on the public Internet. That's why apps ping you about permissions to access "local devices".

zzo38computer 4/20/2025||
> other than HTTPS to specific domains on the public Internet

They should need special permission for that too.

Pesthuf 4/20/2025||||
Maybe there could be a special entitlement that Apple's reviewers would only grant to applications that have a legitimate reason to require such connections. Then only applications granted that permission would be able to make requests to arbitrary domains / IP addresses.

That's how it works with other permissions most applications should not have access to, like accessing user locations. (And private entitlements third party applications can't have are one way Apple makes sure nobody can compete with their apps, but that's a separate issue.)

nottorp 4/20/2025|||
> On the other hand, good bye p2p.

You mean, good bye using my bandwidth without my permission? That's good. And if I install a bittorrent client on my phone, I'll know to give it permission.

> such as companion apps for watches and other peripherals

That's just apple abusing their market position in phones to push their watch. What does it have to do with p2p?

klabb3 4/20/2025||
> using my bandwidth without my permission

What are you talking about?

> What does it have to do with p2p?

It’s an example of when you design sandboxes/firewalls it’s very easy to assume all apps are one big homogenous blob doing rest calls and everything else is malicious or suspicious. You often need strange permissions to do interesting things. Apple gives themselves these perms all the time.

nottorp 4/20/2025||
Wait, why should applications be allowed to do rest calls by default?

> What are you talking about?

That’s the main use case for p2p in an application isn’t it? Reducing the vendors bandwidth bill…

klabb3 4/21/2025||
> That’s the main use case for p2p in an application isn’t it? Reducing the vendors bandwidth bill…

The equivalent would be to say that running local workloads or compute is to reduce the vendors bill. It’s a very centralized view of the internet.

There are many reasons to do p2p. Such as improving bandwidth and latency, circumventing censorship, improve resilience and more. WebRTC is a good example of p2p used by small and large companies alike. None of this is any more ”without permission” than a standard app phoning home and tracking your fingerprint and IP.

nottorp 4/21/2025||
Oh, funny you should pick WebRTC. Back when I was still using Chrome, it prevented my desktop from sleeping because 'WebRTC has active peer connections'. With no indication on which page that is happening.

Great respect for the user's resources.

klabb3 4/21/2025||
Haha yeah I personally hate WebRTC. It’s a mess and I’ve literally rewritten the parts of it I need in order to avoid it. (Check my profile)

I just brought it up as a technology that at the very least is both legitimate and common.

udev4096 4/20/2025||||
Android is so fucking anti-privacy that they still don't have an INTERNET access revoke toggle. The one they have currently is broken and can easily be bypassed with google play services (another highly privileged process running for no reason other than to sell your soul to google). GrapheneOS has this toggle luckily. Whenever you install an app, you can revoke the INTERNET access at the install screen and there is no way that app can bypass it
mjmas 4/20/2025||
Asus added this to their phones which is nice.
zzo38computer 4/20/2025||||
I think capability based security with proxy capabilities is the way to do it, and this would make it possible for the proxy capability to intercept the request and ask permission, or to do whatever else you want it to do (e.g. redirections, log any accesses, automatically allow or disallow based on a file, use or ignore the DNS cache, etc).

The system may have some such functions built in, and asking permission might be a reasonable thing to include by default.

XorNot 4/20/2025||
Try actually using a system like this. OpenSnitch and LittleSnitch do it for Linux and MacOS respectively. Fedora has a pretty good interface for SELinux denials.

I've used all of them, and it's a deluge: it is too much information to reasonably react to.

Your broad is either deny or accept but there's no sane way to reliably know what you should do.

This is not and cannot be an individual problem: the easy part is building high fidelity access control, the hard part is making useful policy for it.

zzo38computer 4/20/2025||
I suggested proxy capabilities, that it can easily be reprogrammed and reconfigured; if you want to disable this feature then you can do that too. It is not only allow or deny; other things are also possible (e.g. simulate various error conditions, artificially slow down the connection, go through a proxy server, etc). (This proxy capability system would be useful for stuff other than network connections too.)

> it is too much information to reasonably react to.

Even if it asks, does not necessarily mean it has to ask every time if the user lets it keep the answer (either for the current session for until the user deliberately deletes this data). Also, if it asks too much because it tries to access too many remote servers, then might be spyware, malware, etc anyways, and is worth investigating in case that is what it is.

> the hard part is making useful policy for it.

What the default settings should be is a significant issue. However, changing the policies in individual cases for different uses, is also something that a user might do, since the default settings will not always be suitable.

If whoever manages the package repository, app store, etc is able to check for malware, then this is a good thing to do (although it should not prohibit the user from installing their own software and modifying the existing software), but security on the computer is also helpful, and neither of these is the substitute for the other; they are together.

vbezhenar 4/20/2025||||
Do you suggest to outright forbid TCP connections for user software? Because you can compile OpenSSL or any other TLS library and do a TCP connection to port 443 which will be opaque for operating system. They can do wild things like kernel-level DPI for outgoing connections to find out host, but that quickly turns into ridiculous competition.
internetter 4/20/2025||
> but that quickly turns into ridiculous competition.

Except the platform providers hold the trump card. Fuck around, if they figure it out you'll be finding out.

tzury 4/20/2025||||
Vast majority of revenues in the mobile apps ecosystem are ads, which by design pulled from 3rd parties (and are part of the broader problem discussed in this post).

I am waiting for Apple to enable /etc/hosts or something similar on iOS devices.

jay_kyburz 4/19/2025|||
Oh, that's an interesting idea. A local DNS where I have to add every entry. A white list rather than Australia's national blacklist.
at0mic22 4/19/2025||
Strange the HolaVPN e.g. Brightdata is not mentioned. They've been using user hosts for those purposes for decades, and also selling proxies en masse. Fun fact they don't have any servers for the VPN. All the VPN traffic is routed through ... other users!
andelink 4/20/2025||
Hola is mentioned in the authors prior post on this topic, linked to at the top of TFA: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/
arewethereyeta 4/19/2025|||
They are even the first to do it and the most litigious of all. Trying to push patents on everything possible, even on water if they can.
Klonoar 4/19/2025||
Is it really strange if the logo is right there in the article?
reincoder 4/20/2025||
I work for IPinfo (a commercial service). We offer a residential proxy detection service, but it costs money.

If you are being bombarded by suspicious IP addresses, please consider using our free service and blocking IP addresses by ASN or Country. I think ASN is a common parameter for malicious IP addresses. If you do not have time to explore our services/tools (it is mostly just our CLI: https://github.com/ipinfo/cli), simply paste the IP addresses (or logs) in plain text, send it to me and I will let you know the ASNs and corresponding ranges to block.

throwaway74663 4/20/2025|
Blocking countries is such a poorly disguised form of racism. Funny how it's always the brown / yellow people countries that get blocked, and never the US, despite it being one of the leading nations in malicious traffic.
reincoder 4/21/2025||
Oh, absolutely not — I have to respectfully but strongly disagree with that sentiment.

In cybersecurity, decisions must be guided by objective data, not assumptions or biases. When you’re facing abuse, you analyze the IPs involved and enrich them with context — ASN, country, city, whether it’s VPN, hosting, residential, etc. That gives you the information you need to make calculated decisions: Should you block a subnet? Rate-limit it? CAPTCHA-challenge it?

Here’s a small snapshot from my own SSH honeypot:

Summary of 1,413 attempts

  - Hosting IPs: 981 (69%)
  - VPNs: 35
  - Top ASNs:
    - AS204428 (SS-Net): 152
    - AS136052 (PT Cloud Hosting Indonesia): 83
    - AS14061 (DigitalOcean): 76
  - Top Countries:
    - Romania: 238 (16.8%)
    - United States: 150 (10.6%)
    - China: 134 (9.5%)
    - Indonesia: 115 (8.1%)
One single /24 from Romania accounts for over 10% of the attacks. That’s not about nationality or ethnicity — it's about IP space abuse from a specific network. If a network or country consistently shows high levels of hostile traffic and your risk tolerance justifies it, blocking or throttling it may be entirely reasonable.

Security teams don’t block based on "where people come from" — they block based on where the attacks are coming from.

We even offer tools to help people explore and understand these patterns better. But if someone doesn’t have the time or resources to do that, I'm more than happy to assist by analyzing logs and suggesting reasonable mitigations.

arewethereyeta 4/22/2025||
You should block abusers not an entire country based on a few actors. You can spin this as much as you like it is still a country block and that country is an incredible IT pool of talent and legitimate users. If we're still there you can block the United States also for your ipinfo business since all stats indicate that US is the number one source of fraud on the internet if we're talking IP addresses which your business does. Let us know how that goes.

I hope nobody does cybersecurity in 2025 by analysing and enriching IP addresses. Not on a market where a single residential proxy provider (which you fail to identify) offers 150M+ exit nodes. Even a JA3 fingerprinting could be more useful than looking at IP addresses. I bet you, romanian ips were not operated by romanians. yet you're banning all romanians?

reincoder 4/22/2025||
The kind of blocking I'm referring to is IP metadata-based, not blanket country bans. I specifically mentioned that a single `/24` subnet was responsible for ~10% of brute-force attempts in my honeypot. That doesn’t mean I’d block all of Romania — obviously, the Romanian IP space is vastly larger — but it does raise questions about specific ASNs and IP ranges. In this case, Romanian IPs accounted for 16.8% of total attacks. That’s statistically significant and calls for deeper analysis, not assumptions.

Cybersecurity is a probabilistic game. You build a threat model based on your business, audience, and tolerance for risk. Blocking combinations of metadata — such as ASN, country, usage type, and VPN/proxy status — is one way to make informed short-term mitigations while preserving long-term accessibility. For example:

If an ASN is a niche hosting provider in Indonesia, ask: “Do I expect real users from here?”

If a /24 from a single provider accounts for 10% of your attacks, ask: “Do I throttle it or add a CAPTCHA?”

The point isn’t to permanently ban regions or people. It’s to reduce noise and protect services while staying responsive to legitimate usage patterns.

As for IP enrichment — yes, it's still extremely relevant in 2025. Just like JA3, TLS fingerprinting, or behavioral patterns — it's one more layer of insight. But unlike opaque “fraud scores” or black-box models, our approach is fully transparent: we give you raw data, and you build your own model.

We intentionally don’t offer fraud scoring or IP quality scores. Why? Because we believe it reduces agency and transparency. It also risks penalizing privacy-conscious users just for using VPNs. Instead, we let you decide what “risky” means in your own context.

We’re deeply committed to accuracy and evidence-based data. Most IP geolocation providers historically relied on third-party geofeeds or manual submissions — essentially repackaging what networks told them. We took a different route: building a globally distributed network of nearly 1,000 probe servers to generate independent, verifiable measurements for latency-based geolocation. That’s a level of infrastructure investment most providers haven’t attempted, but we believe it's necessary for reliability and precision.

Regarding residential proxies: we’ve built our own residential proxy detection system (https://ipinfo.io/products/residential-proxy) from scratch, and it’s maturing fast. One provider may claim 150M+ exit nodes, but across a 90-day rolling window, we’ve already observed 40,631,473 unique residential proxy IPs — and counting. The space is noisy, but we’re investing heavily in research-first approaches to bring clarity to it.

IP addresses aren’t perfect but nothing is! But with the right context, they’re still one of the most powerful tools available for defending services at the network layer. We provide the context and you build the solution.

armchairhacker 4/19/2025||
> I am now of the opinion that every form of web-scraping should be considered abusive behaviour and web servers should block all of them. If you think your web-scraping is acceptable behaviour, you can thank these shady companies and the “AI” hype for moving you to the bad corner.

Why jump to that conclusion?

If a scraper clearly advertises itself, follows robots.txt, and has reasonable backoff, it's not abusive. You can easily block such a scraper, but then you're encouraging stealth scrapers because they're still getting your data.

I'd block the scrapers that try to hide and waste compute, but deliberately allow those that don't. And maybe provide a sitemap and API (which besides being easier to scrape, can be faster to handle).

amiga-workbench 4/19/2025||
What is the point of app stores holding up releases for review if they don't even catch obvious malware like this?
_Algernon_ 4/19/2025||
They pretend to do a review to justify their 30% cartel tax.
klabb3 4/20/2025||
Oh no, they review thoroughly, to make sure you don’t try to avoid the tax.
wyck 4/20/2025|||
This isn't obvious, 99% of apps make multiple calls to multiple services, and these SDK's are embedded into the app. How can you tell whats legit outbound/inbound? Doing a fingerprint search for the worst culprits might help catch some, but it would likely be a game of cat and mouse.
nottorp 4/20/2025||
> How can you tell whats legit outbound/inbound?

If the app isn't a web browser, none are legit?

wyck 4/22/2025||
99.9% of app on app store connect to the network for a multitude of reason, do you really think only browsers connect to the internet? Do you not have an app on your phone?
politelemon 4/19/2025|||
Their marketing tells you it's for protection. What they fail to omit is it's for their revenue protection - observe that as long as you do not threaten their revenue models, or the revenue models of their partners, you are allowed through. It has never been about the users or developers.
charcircuit 4/19/2025|||
The definition of malware is fuzzy.
SoftTalker 4/19/2025|||
Money
arewethereyeta 4/19/2025||
I have some success in catching most of them at https://visitorquery.com
lq9AJ8yrfs 4/19/2025||
I went to your website.

Is the premise that users should not be allowed to use vpns in order to participate in ecommerce?

arewethereyeta 4/19/2025||
Nobody said that, it's your choice to take whatever action fits your scenario. I have clients where VPNs are blocked yes, it depends on the industry, fraud rate, chargeback rates etc.
ivas 4/19/2025|||
Checked my connection via VPN by Google/Cloudflare WARP: "Proxy/VPN not detected"
arewethereyeta 4/19/2025||
Could be, I don't claim 100% success rate. I'll have a look at one of those and see why I missed it. Thank you for letting me know.
nickphx 4/20/2025||
measuring latency between different endpoints? I see the webrtc turn relay request..
pton_xd 4/19/2025||
I thought the closed-garden app stores were supposed to protect us from this sort of thing?
20after4 4/19/2025||
That's what they want you to think.
whstl 4/19/2025|||
Once again this demonstrate that closed gardens only benefit the owners of the garden, and not the users.

What good is all the app vetting and sandbox protection in iOS (dunno about Android) if it doesn't really protect me from those crappy apps...

BlueTemplar 4/19/2025|||
Also my reaction when the call is for Google, Apple, Microsoft to fix this : DDOS being illegal, shouldn't the first reaction instead to be to contact law enforcement ?

If you treat platforms like they are all-powerful, then that's what they are likely to become...

20after4 4/19/2025||||
At the very least, Apple should require conspicuous disclosure of this kind of behavior that isn't just hidden in the TOS.
musicale 4/19/2025|||
Sandboxing means you can limit network access. For example, on Android you can disallow wi-fi and cellular access (not sure about bluetooth) on a per-app basis.

Network access settings should really be more granular for apps that have a legitimate need.

App store disclosure labels should also add network usage disclosure.

kibwen 4/19/2025|||
If you find yourself in a walled garden, understand that you're the crop being grown and harvested.
areyourllySorry 4/20/2025|
further reading

https://krebsonsecurity.com/?s=infatica

https://krebsonsecurity.com/tag/residential-proxies/

https://spur.us/blog/

https://bright-sdk.com/ <- way bigger than infatica

More comments...