Thanks for mentioning me, but you really did the work!
But in order to contribute something useful, as a rule of thumb you want to have 10 times as many passes than failures in order to reject a commit. If a bug has taken up to 2500 runs to reproduce, don't consider it a pass until 30000 runs have succeeded.
It's something to do with Poisson distributions. If you have 𝑛 runs before a failed run on average, and
you want to be 𝑃 % certain that a fix (including a revert or moving beyond the bug in a bisect) reduced the failure
rate, you can use the formula −
𝑛 ln (1 − 𝑃
/100) for how long to run, and the factor for 𝑃=99.99 is about 10.
In fact that means that once you had landed on a merge commit it was probably much better to switch to a linear backwards search because it might have fewer passing runs and passing runs are 10-15 times more expensive as failures. Is that what you did?
But in order to contribute something useful, as a rule of thumb you want to have 10 times as many passes than failures in order to reject a commit. If a bug has taken up to 2500 runs to reproduce, don't consider it a pass until 30000 runs have succeeded.
It's something to do with Poisson distributions. If you have 𝑛 runs before a failed run on average, and you want to be 𝑃 % certain that a fix (including a revert or moving beyond the bug in a bisect) reduced the failure rate, you can use the formula − 𝑛 ln (1 − 𝑃 /100) for how long to run, and the factor for 𝑃=99.99 is about 10.
In fact that means that once you had landed on a merge commit it was probably much better to switch to a linear backwards search because it might have fewer passing runs and passing runs are 10-15 times more expensive as failures. Is that what you did?