Remix.run Logo
jedimastert 7 hours ago

You know I was actually really curious about this so I went back to the HTML and URL W3C standards and surprisingly they don't actually have any definitions of format other than being percent encoded. One might conflate query strings with "form-urlencoded"[0] query strings, which is one potential interoperability format, but in general a queries string is just any percent encoded string following a "?" in a url[1], and just another property in the "URL" HTML object that can be used in the generation of a response. While additionally there is a URLSearchParams object that is the result of parsing the query string with the form-urlencoded parser, this is simply an interoperability layer for JavaScript.

I'm going to be honest, I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.

[0]: https://url.spec.whatwg.org/#application/x-www-form-urlencod...

[1]: https://url.spec.whatwg.org/#url-class

wongarsu 5 hours ago | parent | next [-]

Back in the day it was reasonably common for CMSs and forums to only have an index.php, and routing entirely by query string (in form-urlencoded form, people were not savages). So you would have index.php?p=home and index.php?p=shop. Or index.php?action=showthread&forum=42&thread=17976. It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

In fact lots of sites still work like that, they just hide it behind a couple rewrite rules in apache/nginx for SEO reasons

Semiapies 5 hours ago | parent | next [-]

If you're routing like it's 1999, sure, 404.

On the other hand, if it's a CRUD app and you're filtering a list of entities by various field values? Returning that no items matched your selection (or an empty list, if an API) makes more sense than a 404, which would more appropriate for an attempt to pull up a nonexistent entity URI.

Sander_Marechal 4 hours ago | parent | next [-]

There is no reason you can return that "no items matched your selection" with a 404 HTTP response code instead of a 200.

threatofrain 2 hours ago | parent [-]

A response code of 204 seems more appropriate but the problem is you're not allowed to send further information, which would make that descriptive response... not descriptive enough.

stouset 3 hours ago | parent | prev | next [-]

The point was that returning a 404 for unexpected query strings doesn’t just happen to okay per the specs, but that there is significant historical precedent for doing so based on application design that was common in the past.

brightball 3 hours ago | parent | prev [-]

Yea, empty response at a valid path. Isn’t 204 the code for it?

Lots of REST libraries that I’ve used treat any 400 response as an error so generating a 404 when for an empty list would just create more headaches.

HumanOstrich 39 minutes ago | parent | next [-]

Libraries that automatically throw errors for status codes in the 400 and 500 ranges are pretty obnoxious (looking at you, axios). It adds unnecessary overhead, complexity, and bad ergonomics by hijacking control flow from the app.

Responses with status codes in the 400 range are client errors, so the client shouldn't retry the same request. So a 404 is appropriate despite how annoying a library might be at handling it. Depending on which language/ecosystem you are using, there are likely more sane alternatives.

sk5t 3 hours ago | parent | prev | next [-]

204 might be acceptable if you aren’t returning an entity body to describe what is missing, but do wish to indicate the request was successful.

burnished 2 hours ago | parent | prev [-]

I think the author is comfortable creating headaches for people tacking query strings onto URLs

nofriend an hour ago | parent | prev | next [-]

> It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

That's not obvious at all. If I receive json data that contains a property I'm not aware of, i don't reject the entire document for that reason. In the case of query strings, extra query parameters might be used by other parts of the stack besides yours, so rejecting the entire document because someone somewhere else is trying to pass information to itself is the wrong approach.

saimiam 32 minutes ago | parent [-]

> other parts of the stack

As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

If someone is not on the list, your job is to default to declining them access, not granting them access assuming level 2 security will handle them at a deeper layer.

It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

nofriend 2 minutes ago | parent [-]

>It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

This is how the vast majority of websites work. The practical reason is obvious: when we model the behaviour our code depends on, we want to create the simplest possible model that allows our code to work as expected. Placing requirements on it that our code doesn't actually depend on is useless, unneeded, complexity.

> As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

there is no security benefit to filtering out unneeded url parameters.

sroussey 3 hours ago | parent | prev | next [-]

Oh no, looks like my old forum software urls.

panzi an hour ago | parent | prev [-]

watch?v=oHg5SJYRHA0

nofriend an hour ago | parent | prev | next [-]

Standards are just commonly accepted behaviour that somebody chose to write down somewhere. There are a great number of commonly accepted behaviours that nobody's ever bothered to encode into a formal standard, but where failure to follow the accepted practice will result in widespread breakage. There are also a great many "standards" that you would be a fool to follow to the letter. In the OP case, the only thing that will break is people trying to visit their site, who will presumably simply press the back button on their browser and go about their day. They can decide for themselves if that is an acceptable casualty. But it isn't definitionally acceptable because no standard says it isn't (nor would is suddenly become unacceptable because a standard said it was...)

bawolff 3 hours ago | parent | prev | next [-]

> I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.

This feels like a technically correct is the best kind of correct situation. Like technically, yeah web servers may respond 404 if they dont understand a query parameter, but in practise that is not how urls are conceptualized normally.

qiller 4 hours ago | parent | prev | next [-]

Interestingly, quite a few places that should treat query strings transparently make a lot of assumptions about their structure. We ran into that when picking a new CDN, some providers didn't handle repeat parameters (?a=1&a=2) correctly.

sroussey 3 hours ago | parent [-]

What’s do you mean by correctly?

kstrauser 3 hours ago | parent [-]

Incorrectly would be processing the query string and deduping keys. Correctly would be passing it through as-is, or at least only lightly processing it, like normalizing escaping or such.

sroussey 3 hours ago | parent [-]

Indeed I would expect pass through with no changes.

Though there are “smart” CDNs that will resize images etc. all beats are off for those.

ompogUe 3 hours ago | parent | prev | next [-]

Something I discovered looking back at some old sites: "pages" defined by URL params don't always make it into the Wayback Machine.

TZubiri 2 hours ago | parent | prev | next [-]

Whatwg is for html, try the IEEE http rfcs

nrds 5 hours ago | parent | prev [-]

Wait until you realize that the difference between path and query string is entirely arbitrary and decided by the server. Query strings should never have existed. They are an implementation detail of CGI webservers that leaked all over everything and now smells really bad.

mikeocool 5 hours ago | parent | next [-]

I dunno, it seems like the fact that we arrived at a fairly standard structure for URL paths that works pretty well is not a bad outcome.

Seems a lot better than the other potential world we could lived in, where paths were a black box and every web server/framework invented their own structure for them.

hamburglar 4 hours ago | parent | next [-]

My next website is going to have the path portion of the URL be a base64 encoded ASN.1 blob.

gritzko 4 hours ago | parent | prev [-]

In my current project I use URIs to refer to absolutely any entity in a git(-ish) repo. Files, branches, revisions, diffs, anything. URI turns out to be a really good addressing scheme for everything. Surprise. But the most used and abused element is always the path. Query takes a lot of that mess away. Might have been unmanageable otherwise.

https://github.com/gritzko/beagle

gritzko 4 hours ago | parent [-]

In fact, GitHub URIs are a good example of overusing paths: https://github.com/gritzko/beagle/blob/a7e17290a39250092055f...

  - user gritzko,
  - project beagle, 
  - view blob, 
  - commit a7e17290a39250092055fcda5ae7015868dabdb4, 
  - file path VERBS.md
... all concatenated indiscriminately.
altairprime 2 hours ago | parent | next [-]

That’s not an indiscriminate hierarchy.

Grouping data by user is common and normal in computing: /home laid precedent decades ago.

Project directories are an extremely common grouping within a user’s work sets. Yeah, some of us just dump random files in $HOME, but this is still a sensible tier two path component.

The choice to make ‘view metadata-wrapped content in browser HTML output’ the default rather than ‘view raw file contents’ the default is legitimate for their usage. One could argue that using custom http headers would be preferable to a path element (to the exclusion of JavaScript being able to access them, iirc?) or that the path element blob should be moved into the domain component or should prefix rather than suffix the operands; all valid choices, but none implicitly better or worse here.

Object hash is obviously mandatory for git permalinks, and is perhaps the only mandatory component here. (But notably, that’s not the same as a commit hash.) However, such paths could arguably be interpreted as maximally user-hostile.

File path, interestingly enough, is completely disposable if one refers to a specific result object hash within a commit, but if the prior object hash was required to be a commit, then this is a valid unique identifier for the filesystem-tree contents of that commit. You could use the object hash instead of the full path within the commit hash, but that’s a pretty user-hostile way to go about this.

So, then, which part of the ordering and path selections do you consider indiscriminate, and why?

em-bee an hour ago | parent [-]

actually, instead of the object hash, you could also use the commit-hash. then the filename would be mandatory, but the url would be more readable and usable: give me the file VERBS.md as it is at commit <hash>

em-bee 3 hours ago | parent | prev | next [-]

what would be a better way of doing that? i am not disagreeing, but i just can't think of any way to improve on this. put everything into the query part? i prefer to use the query only for optional arguments. in this example the blob argument is the only thing that doesn't fit in my opinion.

arjvik 3 hours ago | parent [-]

Every object in git (commit, tree, revision of a single file) has a hash that is guaranteed unique within a repository (otherwise many more things than a web UI would break) and likely also globally. I can understand wanting to isolate repositories to prevent hash collisions from causing problems, but within a repo everything has a universally unique ID.

edit: for instance, that specific VERBS.md is represented by the blob 3b9a46854589abb305ea33360f6f6d8634649108.

em-bee an hour ago | parent [-]

that's not what i meant. i was trying to suggest that the string "blob" does not fit. why is it there? why is it needed?

    https://github.com/gritzko/beagle/a7e17290a39250092055fcda5ae7015868dabdb4/VERBS.md
this should be sufficient to represent the file.

"blob" is like a descriptor of the value that follows. it would be like doing this:

    https://github.com/user/gritzko/project/beagle/blob/a7e17290a39250092055fcda5ae7015868dabdb4/file/VERBS.md
this actually irks me every time i see it in a github url
sowbug 20 minutes ago | parent [-]

They are following the /key/value/key/value pattern, but the first two pairs in a GitHub URL are fixed to user and project, which lets them omit the key names. I could see them not being willing to hardcode the third pair to blob.

Back when GitHub URLs were kind of cool, github.com/user/gritzko/project/beagle would have been much less cool than just github.com/gritzko/beagle.

iainmerrick 3 hours ago | parent | prev [-]

Back in the day there was an attempt to introduce "matrix URIs" as a more structured alternative to query strings: https://www.w3.org/DesignIssues/MatrixURIs.html

Of course there's nothing to stop you using URIs like this (I think Angular does, or did at one point?) but I don't think the rules for relative matrix URIs were ever figured out and standardised, so browsers don't do anything useful with them.

halayli 4 hours ago | parent | prev | next [-]

Nothing you said here is correct. Paths, query strings, and fragments are all well defined entities. https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

sroussey 3 hours ago | parent [-]

It’s a string between ? and # isn’t well defined. Or it is and it says very little.

pverheggen 3 hours ago | parent | prev | next [-]

Not entirely arbitrary - forms that use the GET method instead of POST will append form values as query params.

For sites without Javascript, it's great for things like search boxes, tables with sorting/filtering, etc. instead of POST, since it preserves your query in the URL.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

msandford an hour ago | parent [-]

It has always amazed me how much trouble the SPA folks are willing to go to in order to slowly rebuild just normal boring URLs with querystrings because users demand deep linking and back buttons and the like.

Or you could accept that you're probably going to need a round trip to the server and use a normal URL and it's fine.

For all but the absolute biggest websites in the world, anyhow. At Facebook or Google scale yeah it's needed.

gpvos 5 hours ago | parent | prev | next [-]

Query strings existed before CGI did, and the way they're defined to be filled in from web forms is quite useful; I wouldn't want to need Javascript to fit that into path format. There's nothing wrong about having things decided by the server; I don't get that part of your argument at all.

cobbzilla 5 hours ago | parent [-]

Maybe dumb question: how does the server “decide” anything other than what file to serve? Today we have many choices but back in the day CGI was the first standard way to do it.

So yes query parameters existed before CGI but to use them you had to hack your server to do something with them (iirc NCSA web servers had some magic hacks for queries). CGI drove standardization.

stirfish 5 hours ago | parent [-]

    func specialHandler(w http.ResponseWriter, r *http.Request) {
 if time.Now().Weekday() == time.Tuesday {
  http.NotFound(w, r)
  return
 }

     fmt.Fprintln(w, "server made a decision")
    }
Your server can make decisions however you program it to, you know? It's just software.

Forgive the phone-posting.

cobbzilla 2 hours ago | parent | next [-]

and what server software is running this code in 1995?

lispwitch 31 minutes ago | parent [-]

CL-HTTP or AOLserver

cobbzilla 11 minutes ago | parent [-]

sure looks like VB there, what’s the plugin? Didn’t see anything like that before.

2 hours ago | parent | prev [-]
[deleted]
jolmg 5 hours ago | parent | prev | next [-]

It's arbitrary to a degree like the difference between using an attribute or child element in XML, but it's not entirely arbitrary. If you want to include data in the URL that's not part of the hierarchy of the path, query strings are good for that.

paulddraper 4 hours ago | parent | prev [-]

How do you figure?

Paths are hierarchical; query strings are name/value.

(Note I speak of common usage.)

You can create a different convention, but that one is pretty dang useful.