| ▲ | kevin_thibedeau 18 hours ago | |
They are treated like double width characters. All it takes is a Unicode aware layout algorithm that tracks double width codepoints. The tricky part is older single width symbols that were originally not emoji and now have ambiguous width depending on the terminal environment's default presentation mode. | ||
| ▲ | rmunn 18 hours ago | parent | next [-] | |
That's how it should work, and does in terminals that are doing it right. Browsers, however, are looking at the monospaced font and saying "Okay, Source Code Pro doesn't have the U+2192 codepoint," (the → arrow) "so let me find a font that does." On my Linux+Firefox setup, the browser chose Menlo to render the → in the "The fastest way to go from 0 → 1" banner. Menlo's width isn't quite identical to Source Code Pro, so the ┃ character on the right of the box was every so slightly misaligned. Because Firefox isn't following strict fixed-width layout rules, and is allowing itself to use other fonts with different horizontal widths even inside a <pre> block. (I haven't looked at this article in other browsers but I bet they're the same since everyone's mentioning misalignment.) | ||
| ▲ | rmunn 18 hours ago | parent | prev [-] | |
The other tricky part is emojis made up of multiple codepoints with zero-width joiner characters and variation selectors, or other symbols. E.g. is made up of U+1F1FA REGIONAL INDICATOR SYMBOL LETTER U followed by U+1F1F8 REGIONAL INDICATOR SYMBOL LETTER S, or (which should render as a single symbol, a burning heart / heart on fire), which is made up of the four-codepoint sequence U+2764 HEAVY BLACK HEART, U+FE0F VARIATION SELECTOR-16, U+200D ZERO WIDTH JOINER, and U+1F525 FIRE but should only render in one double-width block. Then there are even more complicated sequences like , which again should render in a single block but are made up of six(!) codepoints: U+1F469 WOMAN, U+200D ZERO WIDTH JOINER, U+2764 HEAVY BLACK HEART, U+FE0F VARIATION SELECTOR-16, U+200D ZERO WIDTH JOINER, and U+1F468 MAN. The number of codepoints never did correspond exactly to the number of fixed-width blocks a character should take up (U+00E9 é is the same as U+0065 e plus U+0301 COMBINING ACUTE ACCENT, so it should be rendered in a single block but it might be one or two codepoints depending on whether the text was composed or decomposed before reaching the rendering engine). But with emojis in play, the number of possibilities jumps dramatically, and it's no longer sufficient to just count base characters and ignore diacritics: you have to actually compute the renderings (or pre-calculate them in a good lookup table, which IIRC is what Ghostty does) of all those valid emoji combinations. P.S. The Hacker News comments stripped out those emojis; fair enough. They were, in order: - a US flag emoji (made up of two codepoints) - a heart-on-fire symbol (two distinct symbols combined into a single image, made up of four codepoints total) - a woman and a man with a heart between them (three distinct symbols combined into a single image, made up of six codepoints total) | ||