Remix.run Logo
shirro 3 days ago

Terminal emulators are caught between emulating terminals and teletypes of the past and implementing new features and unicode is one of the struggles. The way most terminals and wcwidth handle the width of characters sometimes is not correct but preserving behavior is important for compatibility. It is possible that its just not worth trying to handle all unicode perfectly in a terminal. Its pretty good for legacy stuff and sysadmin. We have other ways of doing things remotely like html that might be more appropriate for ZWJ emoji and languages with complicated text shaping/rendering.

kevin_thibedeau 3 days ago | parent | next [-]

For glyph width, there are codepoints classified as ambiguous width. These are mostly narrow pre-emoji symbols that have been extended with an alternate emoji representation. There's no way to predict what their width will be, even with explicit variation selectors which might just be ignored.

lifthrasiir 3 days ago | parent | next [-]

> These are mostly narrow pre-emoji symbols that have been extended with an alternate emoji representation.

Nitpick: this is incorrect. Easy counter-examples would be arrow symbols like →. UAX #11 helpfully explains what is "ambiguous" about those characters:

Ambiguous characters occur in East Asian legacy character sets as wide characters, but as narrow (i.e., normal-width) characters in non–East Asian usage. (Examples are the basic Greek and Cyrillic alphabet found in East Asian character sets, but also some of the mathematical symbols.) Private-use characters are considered ambiguous by default, because additional information is required to know whether they should be treated as wide or narrow.

In the other words, these characters have been commonly available in both Asian and non-Asian character sets and assigned different widths by them.

kevin_thibedeau 2 days ago | parent [-]

There are preexisting narrow symbols that were given a new emoji presentation in later standards rather than assigning a new codepoint. Text rendering engines vary on which form is the default. VTE had an option to set the preference. This can be very annoying when some arrows get the new emoji form but others in their cohort stay as narrow glyphs.

charcircuit 3 days ago | parent | prev [-]

The terminal emulator knows what font is being used so it should be possible to predict it.

kevin_thibedeau 2 days ago | parent [-]

The terminal application doesn't know the font. The best you can do is discover ambiguous widths by printing codepoints and checking the change in cursor position. That's a suboptimal experience.

anthk 2 days ago | parent | prev [-]

It does it fine with GNU Unifont and raw XTerm and others. I just had issues with RXVT and clones.