▲ | jiggawatts 3 months ago | |||||||
That was pretty boring too! The "script" was just a few hundred lines of C# code triggering Selenium via its SDK. The requirement was simply to load a set of URLs with two different browsers, an "old" one and a "new" one that included a (potentially) breaking change to cookie handling that the customer needed to check for across all sites. I didn't need to fully crawl the sites, I just had to load the main page of each distinct "web app" twice, but I had process JavaScript and handle cookies. I did this in two phases: Phase #1 was to collect "top-level" URLs, which I did via Certificate Transparency (CT). There's online databases that can return all valid certs for domains with a given suffix. I used about a dozen known suffixes for the state government, which resulted in about 11K hits from the CT database. I dumped these into a SQL table as the starting point. I also added in distinct domains from load balancer configs provided by the customer. This provided another few thousand sites that are child domains under a wildcard record and hence not easily discoverable via CT. All of this was semi-manual and done mostly with PowerShell scripts and Excel. Phase #2 was the fun bit. I installed two bespoke builds of Chromium side-by-side on the 120-core box, pointed Selenium at both, and had them trawl through the list of URLs in headless mode. Everything was logged to a SQL database. The final output was any difference between the two Chromium builds. E.g.: JS console log entries that are different, cookies that are not the same, etc... All of this was related to a proposed change to the Public Suffix List (PSL), which has a bunch of effects on DNS domain handling, cookies, CORS, DMARC, and various other things. Because it is baked into browser EXEs, the only way to test a proposed change ahead of time is to produce your own custom-built browser and test with that to see what would happen. In a sense, there's no "non-production Internet", so these lab tests are the only way. Actually, the most compute-intensive part was producing the custom Chromium builds! Those took about an hour each on the same huge server. By far the most challenging aspect was... the icon. I needed to hand over the custom builds to web devs so that they could double-check the sites they were responsible for, and it was also needed for internal-only web app testing. The hiccup was that two builds look the same and end up with overlapping Windows task bar icons! Making them "different enough" that they don't share profiles and have distinct toolbar icons was weirdly difficult, especially the icon. It was a fun project, but the most hilarious part was that it was considered to be such a large-scale thing that they farmed out various major groups of domains to several consultancies to split up the work effort. I just scanned everything because it was literally simpler. They kept telling me I had "exceeded the scope", and for the life of me I couldn't explain to them that treating all domains uniformly is less work than trying to determine which one belongs to which agency. | ||||||||
▲ | pdimitar 3 months ago | parent [-] | |||||||
EXTREMELY nice. Wish I was paid to do that. :/ | ||||||||
|