| ▲ | bayesnet 4 hours ago | |||||||||||||
I know this is grumpy but this I’ve never liked this answer. It is a perfect encapsulation of the elitism in the SO community—if you’re new, your questions are closed and your answers are edited and downvoted. Meanwhile this is tolerated only because it’s posted by a member with high rep and username recognition. | ||||||||||||||
| ▲ | 1718627440 3 hours ago | parent | next [-] | |||||||||||||
I think this answer was tolerated when SO wasn't as bad as it is now, and wouldn't be tolerated now from anyone. | ||||||||||||||
| ||||||||||||||
| ▲ | throwaway_61235 3 hours ago | parent | prev [-] | |||||||||||||
As someone who used to write custom crawlers 20 years ago, I can confirm that regular expressions worked great. All my crawlers were custom designed for a page and the sites were mostly generated by some CMS and had consistent HTML. I don't remember having to do much bug fixes that were related to regular expression issues. I don't suggest writing generic HTML parsers that works with any site, but for custom crawlers they work great. Not to say that the tools available are the same now as 20 years ago. Today I would probably use puppeteer or some similar tool and query the DOM instead. | ||||||||||||||
| ||||||||||||||