Remix.run Logo
yndoendo 6 days ago

It is 2025 and the best spell checker is a search engine. Numerous time an application will not provide the correct word. Only solution is to try the word in a search engine and try using in a sentence if that fails.

In my opinion, this is where ML/AL local model, no internet required, would be the most beneficial today.

Even had to use a search engine with, "thoughts and opi" because I forgot how to spell opinion before posting this. In application spell checker was 100% useless with assisting me.

athrowaway3z 6 days ago | parent | next [-]

I've had a related idea for a while now.

Instead of how LLMs operate by taking the current text and taking the most likely next token, you take your full text and use an LLM to find the likeliness/rank of each token. I'd imagine this creates a heatmap that shows which parts are the most 'surprising'.

You wouldn't catch all misspelling, but it could be very useful information to find what flows and what doesn't - or perhaps explicitly go looking for something out of the norm to capture attention.

paol 6 days ago | parent | next [-]

I would like this too. This approach would also fix the most common failure mode of spelling checkers: typos that are accidentally valid words.

I constantly type "form" instead of "from" for example and spelling checkers don't help at all. Even a simple LLM could easily notice out of place words like that. And LLMs also could easily go further and do grammar and style checking.

NitpickLawyer 6 days ago | parent | prev | next [-]

I've seen this in a UI. They went a step further and you could select a word (well token but anyway) and "regenerate" from that point by selecting another word from the token distribution. Pretty neat. Had the heatmaps that you mentioned, based on probabilities returned by the LLM.

This should also be pretty cheap (just one pass through the LLM).

6 days ago | parent [-]
[deleted]
anuramat 5 days ago | parent | prev | next [-]

That's how BERT is trained, masked language modeling

dsign 5 days ago | parent [-]

I've used BERT to do that sort of thing. It was a prototype and I was using Pytorch, also, I'm not an expert on Pytorch performance. I also tried with models that succeeded BERT for masked token. My issue with it is that it was slow :-( . My second issue with it is that it wasn't integrated in my favorite word editor. But definitively useful.

anuramat 3 days ago | parent [-]

Did you try any diffusion models? They should be quick enough

simianwords 6 days ago | parent | prev [-]

in fact it can work at the language level completely with a prompt like "mark parts of this paragraph that don't flow well".

golem14 5 days ago | parent | prev [-]

Some new way of Stenography. That everyone can use. Would make taking notes so much easier.