Remix clone Hacker News

new | show | ask | jobs Github

	▲	lsp 3 hours ago
		The phrasing. "It's not just X, it's Y," overuse of "quotes"
	▲	dspillett 2 hours ago \| parent [-]
		The problem with any of these tells is that an individual instance is often taken as proof on its own rather than an indicator. People do often use “it isn't X, it is Y” like constructs¹ and many, myself included sometimes, overuse “quotes”², or use m-dashes³, or are overly concerned about avoiding repeating words⁶, and so forth. LLMs do these things because they are in the training data, which means that people do these things too. It is sometimes difficult to not sound like an LLM-written or LLM-reworded comment… I've been called a bot a few times despite never using LLMs for writing English⁴. -------- [1] particularly vapid space-filler articles/comments or those using whataboutism style redirection, which might be a significant chunk of model training data because of how many of them are out there. [2] I overuse footnotes as well, which is apparently a smell in the output of some generative tools. [3] A lot of pre-LLM style-checking tools would recommend this in place of hyphens, and some automated reformatters would make the change without access, so there are going to be many examples in training data. [4] I think there is one at work in VS which I use in DayJob, when it is suggesting code completion options to save typing (literally Glorified Predictive Text) and I sometimes accept its suggestion, and some of the tools I use to check my Spanish⁵ may be LLM based, so I can't claim that I don't use them at all. [5] I'm just learning, so automatic translators are useful to check what I'm written isn't gibberish. For anyone else doing the same: make sure you research any suggested changes preferably using pre-2023 sources, because the output of these tools can be quite wrong as you can see when translating into a language you are fluent in. [6] Another common “LLM tell” because they often have weighting functions especially designed to avoid token repetition, largely to avoid getting stuck in loops, but many pre-LLM grammar checking tools will pick people up on repeated word use too, and people tend to fix the direct symptom with a thesaurus rather than improving the sentence structure overall.