Remix.run Logo
withinboredom 6 days ago

> Not to disagree, but "non-english" isn't exactly relevant.

how so? programs might use english words but are decidedly not english.

motorest 6 days ago | parent [-]

> how so? programs might use english words but are decidedly not english.

I pointed out the fact that the concept of a language doesn't exist in token predictors. They are trained with a corpus, and LLMs generate outputs that reflect how the input is mapped in accordance to how the were trains with said corpus. Natural language makes the problem harder, but not being English is only relevant in terms of what corpus was used to train them.