We really would benefit from a Bayesian binary search for this purpose, so you can get by with only running the test 1000 times in most cases.