Remix.run Logo
100ms 6 hours ago

Including a strong motivating example might have helped sell this, using an example that could trivially be expressed as a GET is extremely distracting.

Even imagining a QUERY with a large JSON filtering structure, or say an image input as request body, it feels extremely odd to include the request body as part of the cache key. It also implies an unbounded and user-controlled cache key, with the only really meaningful general caching strategy being bitwise compare of the request body (or a hash), which in a hostile scenario implies cache busting would be trivial.

This invokes multiple semantic oddities in one go with obvious difficulties for a very niche use case. If I'm writing a service that needs complex filtering or complex input like an image, any form of caching (e.g. individual data columns of a join, or embeddings keyed by perceptual hashes of a decoded image input) is going to be far away from the HTTP layer and certainly unrelated to the exact bit representation of the request on the wire.

Why even bother trying to capture this in a generic way?

I would be far more inclined to try and capture this caching semantic as a new header for POST. Something like "Vary: request-body" or similar. Perfectly backwards compatible and perfectly ignorable for all but the 0.1% of CDN use cases where the behaviour might turn out useful

Joker_vD 6 hours ago | parent | next [-]

> It also implies an unbounded and user-controlled cache key,

The query part of GET's URI is also barely bounded in practice and user-controlled, and is indeed used as part of the cache key (because it's a part of URI), so I am not sure why you raise this objection at all.

giancarlostoro 6 hours ago | parent | next [-]

> and user-controlled

I've found some sites that tack on a session ID and if you try to tamper with the URL in any way, it sends you back to "Page 1" really annoys me lol at that point let me skip to any page with your web UI.

PunchyHamster 5 hours ago | parent | prev [-]

Well, because it is more code. Current caching software caches by headers + query string. It now needs to be expaned to cache by body too.

It feels very pointless and there is no drawback of just using POST

OvervCW 4 hours ago | parent | next [-]

There is: your browser or other type of client does not know it can repeat a POST request if it fails, whereas a QUERY request can be freely repeated in case of errors.

afavour 4 hours ago | parent | prev [-]

Is caching not the primary reason to use this over POST? You should never want to cache POST requests.

CodesInChaos 6 hours ago | parent | prev | next [-]

The browser can simply store a collision resistant hash (e.g. SHA-256) of the body, if it wants a smaller cache key. I can't really think of any caching related attacks that don't equally apply to a query parameter. Generating a unique 30 character query parameter is just as easy as generating a 30 MB request body, if you want to flood the cache.

ralferoo 4 hours ago | parent [-]

Not necessarily that simple, as you'd have sort all the input parameters to maintain a useable cache key. Not especially difficult, but if the data is large and so re-allocation and sorting is required, then you're starting to open up the attack surface where bugs might have been introduced.

inigyou 4 hours ago | parent | prev | next [-]

Not all usage scenarios are the public internet, and something doesn't have to be useful on the public internet to be standardized.

Realistically, systems for the public internet will use a secure hash as the cache key so it'll always be the same size. The cache key already includes a URL that can be very long, and an arbitrary set of header values.

ralferoo 4 hours ago | parent [-]

Except that by definition, in a URL the data has no implicit meaning so for a cache hit you need an exact match, including order and case, but for a list of POST parameters, they could legitimately be in any order and so you can't just hash it all as a blob, you need to sort the keys, possibly copy data around (unless using keys plus hash), probably allocating more memory, etc. I'm pretty certain we'll see at least one CVE out of the first few implementations of this!

inigyou 3 hours ago | parent [-]

POST/QUERY data can be in any format. Who are you to say order doesn't matter? Are you sure you can even parse it? Mine is in DES-encrypted (with key "password") base85 DER, you really gonna implement that in your proxy?

tanepiper 3 hours ago | parent | prev | next [-]

One example - I'm building an MCP server at the moment for a database I'm working on. In ChatGPT I want to do dry-run posts first that roll back before committing - both are POST requests with a property - and it loves to trigger the safety layer in the tools (for various reasons, it's hard to debug exact causes)

But I think this would make it better - QUERY before POST means different request types, not just the same with a safety flag.

cryptonym 6 hours ago | parent | prev | next [-]

Sure you can provide an image as request body, but you could already do it with b64 query parameter. If you try hard enough, you can poorly use any proposed standard. GET with query parameters already is opaque and makes cache busting trivial.

layer8 6 hours ago | parent [-]

Query parameters are length-limited, because HTTP URIs are: https://www.rfc-editor.org/info/rfc9110/#section-4.1-5. There is no expectation for arbitrarily long HTTP URLs to be functioning.

cryptonym 5 hours ago | parent [-]

Your link doesn't say URIs are length-limited

Draiken 5 hours ago | parent [-]

I'm guessing you never hit this issue then, but it's a real issue. Whether or not it's in the RFC as a hard limit it doesn't matter, no HTTP server will allow unlimited sized URIs.

You simply can't base64 large payloads and you're stuck with workarounds.

cryptonym 5 hours ago | parent [-]

You are guessing wrong. Thanks, I know specific implementation will come with their limits. This will equally apply to QUERY body size and caching strategy.

Are we seriously ok with linking the RFC as source while providing a statement that doesn't match? RFC does matter.

ralferoo 4 hours ago | parent [-]

The RFC does say "It is RECOMMENDED that all senders and recipients support, at a minimum, URIs with lengths of 8000 octets in protocol elements."

One can infer from the RFC that you can reasonably expect many implementations to fail beyond 8000 characters, and that there are no guarantees up to that either.

True, the RFC doesn't specify a limit, but it does clearly indicate that it's not unbounded, nor should you expect it to be.

epolanski 6 hours ago | parent | prev | next [-]

> Why even bother trying to capture this in a generic way?

I guess it's about resolving the odd semantics of using POST which is not idempotent and thus allowing easier control flow of caches and retrys.

Your perspective is 100% correct if you think at the application-layer, but with a dedicated method, you can have that behaviour out-of-the-box out of your HTTP infrastructure (whether it's at your hyperscaler's router or your apache/nginx/browser whatever) and stop implementing yourself the post-as-a-query edge case.

friendzis 6 hours ago | parent | prev | next [-]

> It also implies an unbounded and user-controlled cache key.

While the concern is valid, caching is entirely optional at query level, therefore it is totally valid to cache only certain "filters".

davidkwast 6 hours ago | parent | prev | next [-]

I would use a hash of the body content (the query) as a URL parameter

/?hash=123456789

Joker_vD 6 hours ago | parent [-]

Why? That's pushing more work to do both on yourself and the cache.

wang_li 5 hours ago | parent | prev [-]

If you control the full stack then the functionality described here can be implemented with POST. The only way this comes into play is if some second party client of your service is trying to impose rules on how your backend works. My answer to that is no. I will be defining the contract by which my services operate.