The need for a large sample size is dependent on effect size.

The study showed that 0 of the AI users could recall a quote correctly while more than 50% of the non AI users could.

A sample of 54 is far, far larger than is necessary to say that an effect that large is statistically significant.

There could be other flaws, but given the effect size you certainly cannot say this study was underpowered.

You would need the following cohort size per alpha level (currently 18) at a power level of 80% with an effect size of 50%:

0.05: 11 people per cohort

0.01: 16 people per cohort

0.001: 48 people per cohort

So they do clear the effect size bar for that particular finding at the 99% level, though not quite the 99.9% level. Further, selection effects matter -- are there any school-cohort effects? Is there a student bias (i.e. would a working person at the same age, or someone from a difficult culture or background see the same effect?). Was the control and test truly random? etc. -- all of which would need a larger N to overcome.

So for students from the handful of colleges they surveyed, they identified the effect, but again, it's not bulletproof yet.

	▲	sarchertech 4 days ago \| parent [-]
		With a greater than 99% probability that this is a real effect, i wouldn’t expect this to be difficult to reproduce. But it turns out I misread the paper. It was actually an 80% effect size so greater than 99.9% chance of being a real effect. Of course it could be the case that there is something different about young college students that makes them react very; very differently to LLM usage, but I wouldn’t bet on it.