Remix.run Logo
magnio 4 hours ago

Never, ever, ever transform URIs and paths by string manipulation. If you think pulling in a library for this is overkill, it is not.

(Lesson learned from trying to quickly write my own function to make ".." to go back one URL segment that took 3 hours and discovering the URI spec contradicts my intuition depending on whether the URI is a URL or filesystem path.)

unbelievr 30 minutes ago | parent | next [-]

Differentials between different URI parsers are a huge source of bugs. The amount of shenanigans you can do inside URIs is bonkers, and trying to handle this by yourself with some regex and string splitting is absolutely insane.

Like https://www.example.com:443@203569230:8080/ will send you to the IP address "12.34.56.78" on port 8080 using basic authentication with the domain and port as username and password. If your code tries to split by `:` or check that the URI starts with some specific string, then it won't be good enough. Indeed, use a library that you trust.

Joker_vD 3 hours ago | parent | prev [-]

I don't believe Python's urllib has a function that takes what HTTP terms an "origin-form" (an absolute path with possibly a query attached to it with "?") and parses it apart.

Still, the RFC 9112 that defines HTTP/1.1 basics requires that, for the purposes of URI reconstruction, "if there is no Host header field or if its field value is empty or invalid, the target URI's authority component is empty."

hft 2 hours ago | parent | next [-]

https://docs.python.org/3/library/urllib.parse.html

Joker_vD 20 minutes ago | parent [-]

Yep, none of them are suitable for this use case; you need to validate the Host header first and reconstruct the URI first before parsing it.

20 minutes ago | parent | prev | next [-]
[deleted]
teh64 an hour ago | parent | prev [-]

[dead]