▲ | j_da 5 days ago | |
All great points. A limitation with human feedback is that once you start asking for more than binary preferences (e.g. multiple rankings or written feedback), the quality of the feedback does decrease. For instance, many times humans can give a quick answer on preference, but when asked "why" they prefer one thing over the other, they might not be able to full explain it in language. This in general is very much an open area of research on collecting and incorporating the most optimal types of feedback. I definitely agree with your second point. One idea we're experimenting with is adding a human baseline, in which the models are benchmarked against human generated designs as well. |