| ▲ | logicprog an hour ago | |||||||
> This analysis showed that there is indeed an absence of evidence, but it concludes there is evidence of absence. I tried pretty hard to avoid saying that, can you point me at how to rephrase? The point I'm trying to make is just that there is absolutely no evidence at all for what people are saying with such absolutism and claimed objectivity (that Claude made rsync worse), and thus it doesn't justify the outrage. > Under-sampling, and concluding with p > 0.05 How would I avoid under-sampling here? And if you're going to say it's because I only have 2 data points, well, the side making the positive claim — that Claude made rsync worse — only had two as well, and unremarkable ones at that, as I've tried very hard to show. | ||||||||
| ▲ | runarberg an hour ago | parent [-] | |||||||
You are interpreting the p-values on their own merit rather then using them to test a null-hypothesis. Quotes like: > With a p-value of 74%, the answer is a decisive no. The odds ratio is 1.06 — essentially 1:1. Claude releases are no more likely to be above the median than any other releases. are problematic in this context as the correct conclusion here is you just don‘t have enough data conclude whether or not you are more likely to encounter a bug after a Claude commit. > How would I avoid under-sampling here? You don‘t. You admit that you don’t have enough data and move on. What you are trying to do here is prove a negative, which is extremely hard to do. In your discussion you claim that the users complaining had no right to, however nothing in your analysis showed they were wrong. We simply don‘t have enough data (yet) to say either way. When we have enough data they may be proven right or wrong, but until then, we cannot conclude either way. If you insist still, I recommend looking into bayesian analysis. Theoretically at least the posterior distribution from a bayesian analysis can be interpreted directly and analyses on its own merits. However I suspect your posterior will have way too much uncertainty to reach any conclusions. | ||||||||
| ||||||||