Remix.run Logo
cornholio 5 days ago

Great passion for the subject, definitely doesn't get discouraged by their less than perfect command of English and didn't use an LLM to butcher the text's authentic character.

I find that in my own writing I no longer strive for perfect grammar and polish since nowadays it actually cheapens the end result, everybody has perfect grammar today.

sowbug 4 days ago | parent [-]

Waaay off topic, but does anyone know why LLMs don't have poor grammar if they were trained on the average/poor grammar of the internet? Why don't they mix up then/than or it's/its, or use hypercorrections like "from you and I"?

(Update: of course I had to ask my friendly neighborhood LLM, and the answer is the correct usages still dominate incorrect ones, so statistics favor correctness. They down-weight low quality sources (comments like mine) and up-weight high quality ones (published books, reputable news sites). Then human reinforcement learning adds further polish.)

omneity 4 days ago | parent | next [-]

Two phenomena at play, correct spellings tend to be the most common on aggregate in a large enough dataset so there’s a bias, and the finetuning step (Instruct SFT) helps the model hone down on what it should use from the set of all possible formulations it saw in pretraining.

This is why LLMs can still channel typos or non/standard writing when you ask them to write in such a style for example.

tuetuopay 4 days ago | parent | prev [-]

I would also expect a grammar phase to be part of the training, with an RL pass where the output is fed to a grammar check engine.