Thanks for mentioning me, but you really did the work! But in order to contribut...

Thanks for mentioning me, but you really did the work!

But in order to contribute something useful, as a rule of thumb you want to have 10 times as many passes than failures in order to reject a commit. If a bug has taken up to 2500 runs to reproduce, don't consider it a pass until 30000 runs have succeeded.

It's something to do with Poisson distributions. If you have 𝑛 runs before a failed run on average, and you want to be 𝑃 % certain that a fix (including a revert or moving beyond the bug in a bisect) reduced the failure rate, you can use the formula − 𝑛 ln (1 − 𝑃 /100) for how long to run, and the factor for 𝑃=99.99 is about 10.

In fact that means that once you had landed on a merge commit it was probably much better to switch to a linear backwards search because it might have fewer passing runs and passing runs are 10-15 times more expensive as failures. Is that what you did?