Reverse Engineering Vercel's BotID

Posted by hazebooth 1 day ago

Reverse Engineering Vercel's BotID(www.nullpt.rs)

102 points | 18 comments

ATechGuy 1 day ago|

> At the moment, it seems Basic mode is so basic that it allows everything to pass as human. That’ll likely change as they gather more telemetry to better identify what a bot signal looks like.

So they are basically collecting telemetry in the name of "free basic anti-bot" solution.

cchance 1 day ago|

free basic anti-bot solution that literally NEVER BLOCKS A BOT, like what the actual fuck

codedokode 1 day ago||

Note that the bot detection script uses WebGL to obtain GPU name. I assume this (fingerprinting) is the most popular use of WebGL. Sad that independent browsers like Firefox do not supply fake values.

nullpt_rs 1 day ago||

Sadly, spoofing GPU vendor & renderer can be an even larger flag since they can hash the resulting image of the canvas to compare it with a database of collected fingerprints[0]

[0]: https://research.google/pubs/picasso-lightweight-device-clas...

reaperducer 1 day ago|||

Until a major player gets on board. Then it works.

Apple does this by sending an imposter user agent from Safari on iPads.

If only that was expanded to iPhones, too. And then send rotating, or randomized user agents.

nerdsniper 1 day ago|||

Apple does it because they don’t have a vested financial interest in internet-wide tracking.

Google does.

And while Mozilla does too because the vast majority of their funding comes from Google, it’s more pertinent that they don’t have the market share to pull this off. Firefox would just stop working on major websites if they did this.

ZebulonP 19 hours ago|||

Doesn't that just move the goal post though? Instead of using your GPU vendor for the fingerprint they can just hash the output canvas after they a bunch of odd rendering calls, getting a hash from the quirks of your graphics driver and GPU hardware.

andrewmcwatters 1 day ago|||

It’s funny that trying to click on the Google Scholar link there falsely identifies me as a bot.

grishka 21 hours ago||

IMO the use of <canvas> needs to be behind a permission prompt, the same as e.g. geolocation or WebRTC. Few websites actually need canvas/WebGL for legitimate purposes.

chocolatkey 18 hours ago||

This would break way too many websites to be feasible. And if implemented, would be something requested on so many sites that users would learn to automatically say yes which would weaken the power of permission prompts in general.

For example, almost every major Japanese book/comic site uses canvas in their e-reader

codedokode 14 hours ago||

The best solution would be if canvas only allowed displaying pixels on the page but not drawing (meaning you need to bring your own drawing library) so that it would be unusable for fingerprinting.

b0a04gl 1 day ago|

why is bot detection even happening at render time instead of request time. why can't tell you’re a bot from your headers, UA, IP, TLS fingerprint. imo making it a surveillance. 'you're a bot, ok not just go away, let’s fingerprint your GPU and assign you a behavioral risk score anyway'

n2d4 1 day ago|

It's really hard to detect it at request time. It's practically trivial for an attacker to fake headers to resemble a real browser.

baby_souffle 23 hours ago|||

You absolutely have options at request time. Arguably, some of the things you can only do at request time are part of a full and complete mitigation strategy.

You can fingerprint the originating TCP stack with some degree of confidence. If the request looks like it came from a Linux server but the user agent says Windows, that's a signal.

Likewise, the IP address making the request has geographic information associated with it. If my IP address says I'm in Romania but my browser is asking for the English language version of the page... That's a signal.

Similar to basic IP/Geo, you can do DNS and STUN based profiling, too. This helps you catch people that are behind proxies or VPNs.

To blur the line, you can use JavaScript to measure request timing. Proxies that are going to tamper with the request to hide its origins or change its fingerprint will add a measurable latency.

n2d4 22 hours ago|||

None of these are conclusive by any means. The IP address check you mentioned would mark anyone using a VPN, or English speakers living abroad. Modern bot detection combines lots of heuristics like these together, and being able to run JavaScript in the browser (at render-time) adds a lot more data that can be used to make a better prediction.

cAtte_ 21 hours ago|||

> If my IP address says I'm in Romania but my browser is asking for the English language version of the page... That's a signal.

jesus christ don't give them ideas. it's annoying enough to have my country's language forced on me (i prefer english) when there's a perfectly good http header for that. now blocking me based on this?!

indrora 23 hours ago|||

Anubis does it pretty decently.

iovoid 20 hours ago||

Anubis is not meant to fully stop bots, only slow them down so they don't take down your service. This kind of bot detection is meant to prevent automation.