Remix.run Logo
explosion-s 14 minutes ago

This is a super cool paper! The only thing I'm a bit skeptical of is the detection of 'slop' to begin with - it would be very exciting to see some sort of way to steer a model more towards human-like output more generally. Here I imagine a bit of a challenge is that you're 1) creating the measurement for 'slop' and then 2) reducing it, so yes, you can have amazing results in the detection of the slop you've found (the methodology may be novel - a slight improvement over DPO), but this is a chicken and egg problem, as each new model would suffer differently based on training, RLHF, etc. It's not a very useful method to improve an LLM's writing quality, but rather just reducing the symptoms of said bad writing.

I'm additionally concerned about removing the LLM's voice to begin with, as though an LLM may far overuse specific words and phrases, so too do individual authors. Reducing an LLM towards the average author would result in an 'average' voice, paradoxically unlike any specific author.

I am currently doing research on something very similar over the summer but more towards the detection as opposed to the generation side - I'd love to discuss this with someone in the field if you had a few minutes sometime!