| ▲ | jp1016 6 hours ago |
| One practical thing I appreciated about MessageFormat is how it eliminates a bunch of conditional UI logic. I used to write switch/if blocks for: • 0 rows → “No results”
• 1 row → “1 result”
• n rows → “{n} results” Which seems trivial in English, but gets messy once you support languages with multiple plural categories. I wasn’t really aware of how nuanced plural rules are until I dug into ICU. The syntax looked intimidating at first, but it actually removes a lot of branching from application code. I’ve been using an online ICU message editor (https://intlpull.com/tools/icu-message-editor) to experiment with plural/select cases and different locales helped me understand edge cases much faster than reading the spec alone. |
|
| ▲ | Vinnl 6 hours ago | parent | next [-] |
| This post shows a lot of the challenges with localisation, that many seemingly simple tools don't have an answer to: https://hacks.mozilla.org/2019/04/fluent-1-0-a-localization-... (Fluent informed much of the design of MessageFormat 2.) |
| |
| ▲ | draw_down 5 hours ago | parent [-] | | Indeed, if only it were as simple as “{n} rows”. I18n / l10n is full of things like this, important details that couldn’t be more boring or fiddly to implement. | | |
| ▲ | Joker_vD 4 hours ago | parent [-] | | Which is why Windows UI is littered with language like "number of rows: {n}". | | |
|
|
|
| ▲ | pferde 6 hours ago | parent | prev | next [-] |
| Did not gettext have this for decades? https://www.gnu.org/software/gettext/manual/html_node/Plural... |
| |
| ▲ | Muromec 6 hours ago | parent | next [-] | | Gettext has everything, it just takes knowing five languages to understand what to use for | |
| ▲ | Sharlin 5 hours ago | parent | prev | next [-] | | Yeah, some sort of pluralization support is pretty much the second most important feature in any message localization tool, right after the ability to substitute externally-defined strings in the first place. Even in a monolingual application, spamming plural formatting logic in application code isn't exactly the best practice. | |
| ▲ | iririririr 4 hours ago | parent | prev [-] | | gettext have everything, plus a huge ecosystem like tools to coordinate collaboration from thousand of contributors etc. if alternatives don't start with a very strong case why gettext wasn't a good option, it's already a good indicator of not-invented-here syndrome. | | |
| ▲ | moltonel 3 hours ago | parent [-] | | It's not hard to make a case against gettext, despite its maturity and large ecosystem. IMHO pluralization is a prime example, with an API that only cleanly handles the English case, requires the developer to be aware of translation gotchas, and honnestly confusing documentation and format. Compare that to MessageFormat's pluralization example (https://github.com/unicode-org/message-format-wg/blob/main/s...) which is very easy to understand and fully in the translator's hands. | | |
| ▲ | vsl 2 hours ago | parent [-] | | > IMHO pluralization is a prime example, with an API that only cleanly handles the English case That’s not true at all? Gettext is functionally limited to source code being English (or alike). It handles all translation languages just fine, and competently so. What is doesn’t have is MessageFormat’s gender selectors (useful) or formatting (arguably not really, strays from translations to locales and is better solvable with placeholders and locale-aware formatting code). > fully in the translator's hands. That is a problem that gettext doesn’t suffer from. You can’t reasonably expect translators to write correct DSL expressions. | | |
| ▲ | moltonel 8 minutes ago | parent [-] | | > Gettext is functionally limited to source code being English (or alike). It handles all translation languages just fine, and competently so. The *ngettext() family of functions take two strings (typically singular/plural) and rely on a language-wide expression to choose the variant (possibly more than 2 variants). There's no good reason for taking two strings, this should be handled in the language file, even without a DSL. Ngettext handling a single countable makes some corner-cases awkward, like gendering a group with possibly mixed-gender elements. The Plural-Forms expression not being per-message means that for example even in English "none/one/many foo" has to be handled in code, and that a language with only a rare 3rd plural has to pay the complexity for all cases. Arguably, those are all nitpicks, Gettext is adequate for most projects. But quality translations get cumbersome very quickly. > You can’t reasonably expect translators to write correct DSL expressions. This feels demeaning. Translators regularly have to check the source code, and often write templates, they're well able for a DSL like MessageFormat's, especially when it's always the same expressions for their language. It saves a trip to the bugtracker to get developers to massage their code into something translatable. You can't reasonably expect a English-speaking developer armed with ngettext to know (and prepare their code for) the subtleties of Gaelic numerals. |
|
|
|
|
|
| ▲ | chokma 5 hours ago | parent | prev | next [-] |
| This reminds me of https://perldoc.perl.org/Locale::Maketext::TPJ13 Seems like to get it right for every use case / language, you would need functions to translate phrases - so switch statements may be a valid solution. The number of text elements needed for pagination, CRUD operations and similiar UI elements should be finite :) |
|
| ▲ | 6 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | Muromec 6 hours ago | parent | prev | next [-] |
| I checked the spec and don't get that really. Something should specify the formula for choosing the correct form (ie 1 for 21 in Slavic languages) and the format isnt any better compared to the gettext of 30 years ago |
| |
| ▲ | gcr 5 hours ago | parent [-] | | This confused me too but the formula and rules for variants are specified by the configured language out-of-band, so there is support for this. Let's take your example. In English, counting files looks like this: You have {file_count, plural,
=0 {no files}
one {1 file}
other {# files}
}
In Polish, there are several possible variants depending on the count: Masz 1 plik
Masz 2,3,4 pliki
Masz 5-21 pliko'w
Masz 22-24 pliki
Masz 25-31 pliko'w
Your Polish translators would write: Masz {file_count, plural,
one {# plik}
few {# pliki}
other {# pliko'w}
}
The library (and your translators) know that in Polish, the `few` variant kicks in when `i%10 = 2..4 && i%100 != 12..14`, etc. I think the library just knows these rules for each language as part of the standard. Mozilla says that it was an explicit design goal to put "variant selection logic in the hands of localizers rather than developers"The point is that it's supported, it simplifies developer logic, and your translators know how to work with it. See https://www.unicode.org/cldr/charts/48/supplemental/language... (Apologies if I got the above translation strings wrong, I don't speak Polish. Just working from the GNU gettext example.) | | |
| ▲ | Muromec 33 minutes ago | parent | next [-] | | >This confused me too but the formula and rules for variants are specified by the configured language out-of-band, so there is support for this. Well, making out of band sure is one way to do to prevent lazy people from doing eval on plural forms from the po file. I hope the library is actually good then. | |
| ▲ | yorwba 4 hours ago | parent | prev | next [-] | | "the library just knows these rules for each language as part of the standard" sounds great until you try to support a small minority language that the library just doesn't know about and then you're left trying to hack around it by pretending that it's actually a regional variety of another language with similar plural rules. AFAIK, unlike gettext, MessageFormat doesn't allow you to specify a formula for the plural forms as part of the localization data, so the variant selection logic ended up in the hands of library developers rather than localizers or application developers. And the standard does get updated occasionally, which can also lead to bugs with localization data written against another version of the standard: https://github.com/cakephp/cakephp/issues/18740 | |
| ▲ | npodbielski 5 hours ago | parent | prev [-] | | usually it is ó instead of o' but otherwise very good :) |
|
|
|
| ▲ | iririririr 4 hours ago | parent | prev [-] |
| that's a lazy feature. dealing with this on the front end is the right thing so you can have rich empty states anyway. |