do you feel this way about every study with N~=54? For instance the GLP-1 brain cancer one?

You'll need to specify the study, I see several candidates in my search, several that are quite older.

Generally, yes, low N is unequivocally worse than high N in supporting population-level claims, all else equal. With fewer participants or observations, a study has lower statistical power, meaning it is less able to detect true effects when they exist. This increases the likelihood of both Type II errors (failing to detect a real effect) and unstable effect size estimates. Small samples also tend to produce results that are more vulnerable to random variation, making findings harder to replicate and less generalizable to broader populations.

In contrast, high-N studies reduce sampling error, provide more precise estimates, and allow for more robust conclusions that are likely to hold across different contexts. This is why, in professional and academic settings, high-N studies are generally considered more credible and influential.

In summary, you really need a large effect size for low-N studies to be high quality.

▲

sarchertech 5 days ago | parent [-]

The need for a large sample size is dependent on effect size.

The study showed that 0 of the AI users could recall a quote correctly while more than 50% of the non AI users could.

A sample of 54 is far, far larger than is necessary to say that an effect that large is statistically significant.

There could be other flaws, but given the effect size you certainly cannot say this study was underpowered.

▲

tomrod 5 days ago | parent [-]

You would need the following cohort size per alpha level (currently 18) at a power level of 80% with an effect size of 50%:

0.05: 11 people per cohort

0.01: 16 people per cohort

0.001: 48 people per cohort

So they do clear the effect size bar for that particular finding at the 99% level, though not quite the 99.9% level. Further, selection effects matter -- are there any school-cohort effects? Is there a student bias (i.e. would a working person at the same age, or someone from a difficult culture or background see the same effect?). Was the control and test truly random? etc. -- all of which would need a larger N to overcome.

So for students from the handful of colleges they surveyed, they identified the effect, but again, it's not bulletproof yet.

	▲	sarchertech 4 days ago \| parent [-]
		With a greater than 99% probability that this is a real effect, i wouldn’t expect this to be difficult to reproduce. But it turns out I misread the paper. It was actually an 80% effect size so greater than 99.9% chance of being a real effect. Of course it could be the case that there is something different about young college students that makes them react very; very differently to LLM usage, but I wouldn’t bet on it.