Remix.run Logo
bjourne 6 hours ago

Correct me if I'm wrong, but the problem is that it is almost impossible to evaluate sampling methods. You can't just look at perplexity and conclude that A is better than B. So you need large-scale expensive human evaluations. Even if you have those it is difficult to extrapolate results since what sampling method works best depends on the dataset(s).