Remix.run Logo
ehnto 3 days ago

Would they not quickly bocome divergent vectors? In the same way that apple and Apple can exist in the same vector set with totally different meanings?

So all information gleaned reading a glyph in the context of japanese articles would be totally different vectors to the information gleaned from the same glyph in Chinese?

numpad0 3 days ago | parent [-]

I don't know, but at least older Qwen models were a bit confused as to what words belong to which languages, and recent ones seem noticeably less sure about ja-JP in general. Maybe it vaguely relates Hanzi/Kanji character being more coarse grained than Latin alphabets so that there aren't enough character counts to tell apart or something.