Event study: A way to measure how returns respond to events. Popularized by Fama in "The Adjustment of Stock Prices to New Information" but ubiquitous in securities litigation, academic financial economics, and equity L/S research. The canonical recipe is MacKinlay's "Event Studies in Economics and Finance". Industry people tend to just use residual returns from Axioma / Barra / in house risk model.
So let's say your hypothesis is "stock go up on insider buy". Event studies help you test that hypothesis and quantify how much up / when.
Cool metadata table, I'm curious about the ticker source (Form4, 10K, some SEC metadata publications?).
My comment about CUSIP linking was trying to illustrate a more general issue: it's difficult to use SEC data extractions to answer empirical questions if you don't have a good securities master to link against (reference data + market data).
Broadly speaking a securities master will have 2 kinds of data: reference data (identifiers and dates when they're valid) and market data (price / volume / corporate actions... all the stuff you need to accurately compute total returns). CRSP/Compustat (~$40k/year?) is the gold standard for daily frequency US equities. With a decent securities master you can do many interesting things. Realistic backtests for the kinds of "use an LLM to code a strategy" projects you see all over the place these days. Or (my interest) a "papers with code" style repository that helps people learn the field.
What you worry about with bad data is getting a high tstat on a plausible sounding result that later fails to replicate when you use clean data (or worse, try to trade it). Let's say your securities master drops companies 2 weeks before they're delisted... just holding the market is going to have serious alpha. Ditto if your fundamental data reflects restatements.
On the reference data front, the Compustat security table has (from_date, thru_date, cusip, ticker, cik, name, gics sector/industry, gvkey, iid) etc all lined up and ready to go. I don't think it's possible to generate this kind of time-series from cheap data vendors. I think it could be possible to do it using some of the techniques you described, and maybe others. Eg get (company-name, cik, ticker) time-series from Form4 or 10K. Then get (security-name, cusip) time-series from the 13F security lists SEC publishes quarterly (pdfs). Then merge on date/fuzzy-name. Then validate. To get GICS you'd need to do something like extract industry/sector names from a broad index ETF's quarterly holdings reports, whose format will change a lot over the years. Lots of tedious but valuable work. Also a lot of surface area to leverage LLMs. I dunno, at this point it may be feasible to use LLMs to extract all this info (annually) from 10Ks.
On the market data front, the vendors I've seen have random errors. They tend to be worst for dividends/corporate-actions. But I've seen BRK.A trade $300 trillion on a random Wednesday. Haven't noticed correlation across vendors, so I think this one might be easy to solve. Cheap fundamental data tends to have similar defects to cheap market data.
Sorry for the long rant, I've thought about this problem for a while but never seriously worked on it. One reason I haven't undertaken the effort: validation is difficult so it's hard to tell if you're actually making progress. You can do things like make sure S&P500 member returns aggregate to SPY returns to see if you're waaay off. But detailed validation is difficult without a source of ground truth.