I’ve banned query strings

Posted by susam 22 hours ago

I’ve banned query strings(chrismorgan.info)

459 points | 238 commentspage 4

casey2 2 hours ago|

YouTube couldn't do that

codingclaws 16 hours ago||

I was just wondering if I should do something like this. I use a couple query string values and I validate them and issue a 40x if the value is invalid. So, I was wondering if I should issue a 40x for an unused query string val.

gwern 19 hours ago||

Query strings break unpredictably, and that alone is enough to ban them by third parties, especially for something as minor as referral tracking.

Example: The Browser is a well known link aggregation paid periodical. I subscribe, and every 1 in 10 or 20 links I clicked, it'd just break outright and I'd have to tediously edit the URL to fix it (assuming the website didn't do a silent ninja URL edit and make it impossible for me to remember what URL I opened possibly days or weeks ago in a tab and potentially fix it). This was annoying enough to bother me regularly, but not enough to figure out a workaround.

Why? ...Because TB was injecting a '?referrer=The_Browser' or something, and the receiving website server got confused by an invalid query and errored out. 'Wow, how careless of The Browser! Are they really so incompetent as to not even check their URLs before mailing an issue out to paying subscribers?'

I wondered the same thing, and I eventually complained to them. It turns out, they did check all their URLs carefully before emailing them out... emphasis on 'before', which meant that they were checking the query-string-free versions, which of course worked fine. (This is a good example of a testing failure due to not testing end-to-end or integration testing: they should have been testing draft emails sent to a testing account, to check for all possible issues like MIME mangling, not just query string shenanigans.)

After that they fixed it by making sure they injected the query string before they checked the URLs. (I suggested not injecting it at all, but they said that for business reasons, it was too valuable to show receiving websites exactly how much traffic TB was driving to them on net, because referrers are typically stripped from emails and reshares and just in general - this, BTW, is why the OP suggestion of 'just set a HTTP referrer header!' is naive and limited to very narrow niches where you can be sure that you can, in fact, just set the referrer header.)

But this error was affecting them for god knows how long and how many readers and how many clicks, and they didn't know. Because why would they? The most important thing any programmer or web dev should know about users is that "they may never tell you": https://pointersgonewild.com/2019/11/02/they-might-never-tel... (excerpts & more examples: https://gwern.net/ref/chevalier-boisvert-2019 ). No matter how badly broken a feature or service or URL may be, the odds are good that no user will ever tell you that. Laziness, public goods, learned helplessness / low standards, I don't know what it is, but never assume that you are aware of severe breakage (or vice-versa, as a user, never assume the creator is aware of even the most extreme problem or error).

Even the biggest businesses.... I was watching a friend the other day try to set up a bank account in Central America, and clicking on one of the few banks' websites to download the forms on their main web page. None of the form PDF download links worked. "That's not a good sign", they said. No, but also not as surprising as you might think - the bank might have no idea that some server config tweak broke their form links. After all, at least while I was watching, my friend didn't tell them about their problem either!

gojomo 16 hours ago|

I don't see how your example, The Browser (thebrowser.com), supports your argument that ad-hoc query-string additions are so prone-to-breaking that 3rd parties should ban them.

In fact, the example seems to suggest the opposite: a 17+ year successful paid subscription business – to which you appear to be a generally-satisfied customer! – receives enough "business value" from the practice, despite its failure modes, they don't want to stop. Improving their probe of the risk-of-failure was enough.

Seemingly, the practice works often enough, pleasing more destination sites than it angers, that "referral tracking" is not something "so minor".

gwern 15 hours ago||

> Improving their probe of the risk-of-failure was enough.

The point was it was dangerous in a way they didn't even realize was an issue, for a thin business rationale. Unless you are going to do thorough tests and understand the risk you are taking (which they did not, as evidenced by screwing it up systematically at scale for years), you should not be doing it.

And it's not obvious that they are correct in their tightened-up testing, because even if a link is correct at the time they test it, it could break at any time thereafter.

> to which you appear to be a generally-satisfied customer!

No matter what _X_ is, _X_ would have to be a pretty epic screwup to make a customer unsubscribe solely over that! I never claimed it was such a major epic screwup that it could do that. So that is an unreasonable criterion: "well, you didn't outright quit, so I guess it can't be that bad." Indeed, but I never said it was, and somewhat bad is still bad; I was in fact fairly annoyed by the random breakage, and at the margin, everything matters. If TB did a few other things, in sum, they could potentially convince me to let my subscription lapse. An annoyance here, a papercut here, and pretty soon a generally-satisfied customer is no longer so satisfied...

notlive 18 hours ago||

Referrer is sometimes nice to know. If your site gets a traffic spike from an email newsletter that traffic won't correctly identify the source in the http headers.

No qualms with OP, your site your rules.

donohoe 14 hours ago||

A neat and funny idea - but in the end it is hostile to the users who don’t always control what’s added to links.

llimllib 12 hours ago||

> curl, for example, seems to illegitimately strip a trailing question mark (could be only for the command line, didn’t test library usage).

umm what? I don't know what they're actually sending where they think this, but if you think curl is broken you should re-think that maybe you're the one doing something wrong.

Here are some examples showing curl not stripping question marks (obviously), I am very curious what this person was actually seeing

    $ curl -s 'https://httpbingo.org/get?' | jq .url
    "https://httpbingo.org/get?"
    $ curl -s 'https://httpbingo.org/get?path' | jq .url
    "https://httpbingo.org/get?path"
    $ curl -s 'https://httpbingo.org/get?path,query=bananas' | jq .url
    "https://httpbingo.org/get?path,query=bananas"
    $ curl -s 'https://httpbingo.org/get????' | jq .url               
    "https://httpbingo.org/get????"
    $ curl -sv 'https://httpbingo.org/????' 2>&1 | grep :path
    * [HTTP/2] [1] [:path: /????]

chrismorgan 11 hours ago|

  $ curl -s 'https://httpbingo.org/get?' | jq .url
  "https://httpbingo.org/get"

This may require further investigation.

susam 7 hours ago|||

From Debian 13.2 (Trixie) + Bash + curl 8.14.1:

  $ curl -s 'https://httpbingo.org/get?' | jq .url
  "https://httpbingo.org/get"

But on macOS + Bash/Zsh + curl 8.7.1:

  $ curl -s 'https://httpbingo.org/get?' | jq .url
  "https://httpbingo.org/get?"

I see some related changes here: https://github.com/curl/curl/commit/3eac21d

Groxx 10 hours ago|||

Might be shell expansion? zsh uses `?` for filename expansion, others might as well: https://zsh.sourceforge.io/Doc/Release/Expansion.html#Filena...

Though I forget if any shell does stuff like that in quotes. Or printing oddities.

chrismorgan 7 hours ago||

No, it’s definitely curl that’s doing it.

  $ echo 'https://httpbingo.org/get?'
  https://httpbingo.org/get?
  $ python
  >>> import json
  >>> import subprocess
  >>> json.loads(subprocess.run(['curl', '-s', 'https://httpbingo.org/get?'], stdout=subprocess.PIPE).stdout)['url']
  'https://httpbingo.org/get'

I’m using curl 8.20.0-3, Arch Linux, x86_64.

  $ curl --version
  curl 8.20.0 (x86_64-pc-linux-gnu) libcurl/8.20.0 OpenSSL/3.6.2 zlib/1.3.2 brotli/1.2.0 zstd/1.5.7 libidn2/2.3.8 libpsl/0.21.5 libssh2/1.11.1 nghttp2/1.69.0 ngtcp2/1.22.1 nghttp3/1.15.0 mit-krb5/1.21.3
  Release-Date: 2026-04-29
  Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns mqtt mqtts pop3 pop3s rtsp scp sftp smtp smtps telnet tftp ws wss
  Features: alt-svc brotli GSS-API HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd

legitster 18 hours ago||

Query strings are awesome. Especially for one-page applications.

I build a lot of internal applications, and one of my golden UI rules is that a user should be able to share their URL and other users should be able to see exactly what the sender did.

So if you have a dashboard or visualization where the user can add filters or configurations, I have all of their settings saved automatically in the URL. It's visible, it's obvious, it's easy, it's convenient.

>There is also a moral question here about whether it is okay to modify a given URL on behalf of the user in order to insert a referral query string into it. I think it isn't.

These dogmatic technical screeds are all so weird to me. They usually reveal more about the authors lack of experience or imagination than provide a useful truism.

keane 18 hours ago||

Yes, query strings often enable useful features! But Chris's post, "no unauthorised query strings", is only regarding third parties adding them.

legitster 18 hours ago||

But... like... that's a weird hill to die on.

> If I wanted to know I’d look at the Referer header; and if it isn’t there, it’s probably for a good reason. You abuse your users by adding that to the link.

The reason is that the referrer headers are a usability and privacy nightmare. It's weird for the author to jump to such a conclusion.

This referral information is being done purely as a courtesy to the webhost. If we imagined a world in which ChatGPT or Wikipedia launched massive hugs of death on referral links without attributing themselves, that is a much, much worse outcome.

kyralis 15 hours ago|||

There's a referrer header, if the client wishes to send it. If they don't, the "courtesy to the web host" is done at the expense of the client. This particular web host takes umbrage at other sites taking advantage of their clients that way, which seems reasonable to me.

chrismorgan 3 hours ago|||

[dead]

jimmaswell 18 hours ago||

A relatively minor impact concern is that query strings create a new cache entry both in the browser and typically on server-side caches unless configured otherwise, so you might want to use URL fragment parameters if the parameters are only used by clientside JavaScript but the server response is the same.

dredmorbius 17 hours ago||

This is genius, kudos Chris.

It also makes me wonder what other noxious online behaviours might be addressed through ... creative ... client-side responses similar to this.

We've already seen, for years, sites attempting to socially-condition people over the use of ad-blockers and Javascript disablers. No reason why the Other Side can't fight back as well.

julianlam 20 hours ago|

> After I implemented that feature, a page from one of my favourite websites refused to load in the console... the third URL returns an HTTP 404 error page. The website uses the query string to determine which one of its several font collections to show.

Yes, let's unilaterally decide that query strings are bad because one website (ab)uses query strings to load different fonts.

It's the query strings that are the problem, not the website!

jfc.

Look, I'm against utm fragments as much as the next guy, but let's not throw away a perfectly good thing because tracking is evil.

ergonaught 20 hours ago||

Adding your own garbage to someone else's URLs is in fact the problem. Could they handle your garbage better? Sure. Is your garbage still a problem? Yes.

SoftTalker 20 hours ago||

Postel's law worked OK when people operated in good faith. But today the internet is full of abusers. Rejecting requests that aren't exactly what they should be is probably the best policy now.

wtallis 19 hours ago||

Postel's law is typically stated as "be conservative in what you do, be liberal in what you accept from others". It's unfortunately common for people to ignore the first half and hallucinate a third clause demanding that the recipient stay silent about the errors they receive.

InsideOutSanta 19 hours ago|||

That website is not abusing query strings, though, its usage of query strings is perfectly cromulent. And tfa is not saying not to use query strings, but not to append random garbage to other people's URLs.

jorams 19 hours ago|||

The website uses the feature for its intended purpose. Adding random trash to the query string of another website assuming it'll ignore it is in fact a bad idea, always, even if you can usually get away with it.

LocalH 19 hours ago|||

The problem is adding query strings to the URLs of others. It's peak entitlement to think that's proper

jedimastert 19 hours ago||

> one website (ab)uses query strings

Really not abusing abusing query strings from a standards perspective, a 404 is not an improper response to an unexpected query string

More comments...