| ▲ | comex 2 hours ago | |
If you already know where the start of the opening tag is, then I think a regex is capable of finding the end of that same opening tag, even in cases like yours. In that sense, it’s possible to use a regex to parse a single tag. What’s not possible is finding opening tags within a larger fragment of HTML. | ||
| ▲ | kstrauser 17 minutes ago | parent [-] | |
For any given regex, an opponent can craft a string which is valid HTML but that the regex cannot parse. There are a million edge cases like:
and
Now your regex has to include balanced comment markers. Solve thatYou need a context-free grammar to correctly parse HTML with its quoting rules, and escaping, and embedded scripts and CDATA, etc. etc. etc. I don't think any common regex libraries are as powerful as CFGs. Basically, you can get pretty far with regexes, but it's provably (like in a rigorous compsci kinda way) impossible to correctly parse all valid HTML with only regular expressions. | ||