Remix.run Logo
hauntsaninja 3 hours ago

Yep, you can run bayesect to an arbitrary confidence level.

This script in the repo https://github.com/hauntsaninja/git_bayesect/blob/main/scrip... will show you that a) the confidence level is calibrated, b) how quickly you get to that confidence level (on average, p50 and p95)

For the failure rates you describe, calibration.py shows that you should see much higher accuracy at 300 tests

rs545837 an hour ago | parent [-]

You're right, at 300 tests bayesect converges to ~97-100% across the board. I reran with calibration.py and confirmed.

Went a step further and tested graph-weighted priors (per-commit weight proportional to transitive dependents, Pareto-distributed). The prior helps in the budget-constrained regime:

128 commits, 500 trials:

Budget=50, 70/30: uniform 22% → graph 33% Budget=50, 80/20: uniform 71% → graph 77% Budget=100, 70/30: uniform 56% → graph 65% At 300 tests the gap disappears since there's enough data to converge anyway. The prior is worth a few bits, which matters when bits are scarce.

Script: https://gist.github.com/rs545837/b3266ecf22e12726f0d55c56466...