▲ | gwern 2 days ago | |
> What I can't figure out is why you seem so confident that the OP didn't verify the LLM output and/or would have published anything written by the model, whether it was faulty or not (which again, in this case it wasn't). Because I routinely catch people, often very intelligent and educated people, confidently posting LLM materials that they have not factchecked and which are wrong (such as, say, about history, where they might make an argument about human evolution based on cave-dwelling where 4o was merely off by a few hundred thousand years), and I still find confabulations in my own use of even frontier models like o1-pro or Gemini-2.5-pro (leaving aside o3's infamous level of crimes, which I think is probably unrepresentative of reasoning models and idiosyncratic to it). And in this case, the prompt looks like it was very low quality. The author had plenty of chances to put in details from his own real experience as an actual Iranian - any intelligent, observant college undergrad routinely buying coffee ought to have plenty to say - and instead, it's super-vague waffle that... well... an LLM could have written about just about any country, swapping out a few clauses. ("Why drinking coffee in America has become so complicated") > You're clearly allergic to basic LLM-style This is not a 'basic' LLM style. It is a very specific, recent chatbot style. (Note that visarga was able to instantly tell it was the recent 4o, because that style is so distinct compared to the previous 4o - never mind Claude-3, Gemini, Llama, Grok etc.) Further, there should be no single 'LLM-style'; it makes me sad how much LLM writing capability has been collapsed and degraded by RLHF tuning. Even my char-RNN outputs from 2015, never mind GPT-3-base in 2020, showed more occasional sparks of flair than a 2023 ChatGPT did. > or at least the masquerading of LLM text as human, so I'm curious what you'd consider worse: 1. LLM-generated text reflecting an accurate prompt/input, or 2. genuine human BS wanting to be taken seriously? (e.g. The Areas of My Expertise by John Hodgman if it wasn't in jest) #1 is worse (if unedited/factchecked/improved etc and just dumped out raw), because there will be much more of it and the intermingling of fact and fiction makes it harder to factcheck, harder to screen out of future training corpuses, and overall more insidious. Human BS serves as costly proof-of-work and because it is costly, once you recognize you are reading BS from someone like Elon Musk or Sam Altman, you can switch modes and ignore the factual content and ask, 'why is he writing this? what purpose does this BS serve? who is the audience here and how are they using it?' and get something quite useful out of it. I have learned a lot from statements by humans where little or none of it was factually true. Whereas a LLM output may mean nothing more than 'some unattended code spent $0.0001 to spam social media with outputs from a canned prompt', if even that. |