Remix.run Logo
SugarReflex 10 hours ago

I hope this comment is not out of place, but I am wondering what the application for all this is? How can this help us or what does it teach us or help us prove? I am asking out of genuine curiosity as I barely understand it but I believe it has something to do with probability.

edit: thanks for the responses! I was not even familiar with `git bisect` before this, so I've got some new things to learn.

augusto-moura 10 hours ago | parent | next [-]

The writeup [1] linked on the README has examples and a better explanation

[1]: https://hauntsaninja.github.io/git_bayesect.html

Retr0id 10 hours ago | parent | prev | next [-]

If you're git bisecting a flakey test, normally your only option is to run the test many times until you're ~certain it's either flakey or not flakey. If your test suite is slow, this can take a long time.

One way to think about the tool presented is that it minimizes the number of times you'd need to run your test suite, to locate the bad commit.

ajb 9 hours ago | parent [-]

It's worth noting that the analysis (although not this specific algorithm) applies in cases where there is a deterministic approach, but a nondeterministic algorithm is faster.

For example, suppose you have some piece of hardware which you can interrogate, but not after it crashes. It crashes at a deterministic point. You can step it forward by any amount of steps, but only examine it's state if it did not crash. If it crashed, you have to go back to the start. (I call this situation "Finnegan Search", after the nursery rhyme which prominently features the line "poor old Finnegan had to begin again").

The deterministic algorithm has you do an examination after every step. The nondeterministic algorithm has you choose some number of steps, accepting the risk that you have to go back to the start. The optimal number of steps (and thus the choice of algorithm) depends on the ratio of the cost of examination to the cost of a step. It can be found analytically as the expected information gain per unit time.

(Either way the process is pretty annoying and considerable effort in hardware and software design has gone into providing ways to render it unnecessary, but it still crops up sometimes in embedded systems).

curuinor 10 hours ago | parent | prev | next [-]

Bayesian inference is, to be overly simple, a way to write probabilistic if-statements and fit them from data. The "if" statement in this case is "if the bug is there...", and of course it's often the case that in actual software that if statement is probabilistic in nature. This thing does git bisect with a flaky bug with bayesian inference handling the flakiness to get you a decent estimate of where the bug is in the git history. It seems to be usable, or at least as usable as a Show HN thingy is expected to be.

teckywoe 9 hours ago | parent | prev | next [-]

Opening the discussion to include properties of nondeterministic bugs.

Often these bugs depend on timing, caused by unpredictable thread scheduling, CPU load, disk and networking timing, etc. Git commits can affect app timing and change the likelihood of the bug occurring, but in many cases these changes aren't related to the underlying bug. That's distinct from a regular git bisect to find a deterministic bug.

One cool bayesect application is to identify the commit that most frequently hits the bug, so it's easier to debug. But more broadly, I'm wondering about the underlying philosophy of bisection for nondeterministic bugs, and when I'd use it?

hrmtst93837 an hour ago | parent | next [-]

Bayesian bisection helps when the failure rate shifts enough to measure. Once a bug rides on scheduler jitter or some external side effect you don't control, the posterior gets mushy fast.

In practice bayesect gives you a ranked suspect list, and that's still better than poking at the repo blind, but calling the top commit "the cause" is where people fool themselvs. Half the time the winner only nudged timing and the bug still lives somewhere else.

rs545837 2 hours ago | parent | prev [-]

[dead]

ncruces 8 hours ago | parent | prev [-]

If you want to read more about the bisect idea itself, this link has a bunch of interesting use cases:

https://research.swtch.com/bisect

TLDR: you can bisect on more than just "time".