Remix clone Hacker News

new | show | ask | jobs Github

	▲	energy123 3 hours ago
		An example of why a basic understanding is helpful: A common sentiment on HN is that LLMs generate too many comments in code. But comment spam is going to help code quality, due to the way causal transformers and positional encoding works. The model has learned to dump locally-specific reasoning tokens where they're needed, in a tightly scoped cluster that can be attended to easily, and forgetting about just as easily later on. It's like a disposable scratchpad to reduce the errors in the code it's about to write. The solution to comment spam is textual/AST post-processing of generated code, rather than prompting the LLM to handicap itself by not generating as much comments.
	▲	minikomi an hour ago \| parent \| next [-]
		An example of why a basic understanding is helpful: A common sentiment on HN is that LLMs generate too many comments in code. For good reason -- comment sparsity improves code quality, due to the way causal transformers and positional encoding work. The model has learned that real, in-distribution code carries meaning in structure, naming, and control flow, not dense commentary. Fewer comments keep next-token prediction closer to the statistical shape of the code it was trained on. Comments aren’t a free scratchpad. They inject natural-language tokens into the context window, compete for attention, and bias generation toward explanation rather than implementation, increasing drift over longer spans. The solution to comment spam isn’t post-processing. It’s keeping generation in-distribution. Less commentary forces intent into the code itself, producing outputs that better match how code is written in the wild, and forcing the model into more realistic context avenues.
	▲	libraryofbabel an hour ago \| parent \| prev \| next [-]
		Unless you have evidence from a mechanistic interpretability study showing what's happening inside the model when it creates comments, this is really only a plausible-sounding just-so story. Like I said, it's a trap to reason from architecture alone to behavior.
	▲	p1esk 2 hours ago \| parent \| prev [-]
		You’re describing this like if you actually knew what’s going on in these models. In reality it’s just a guess and not a very convincing one.