| ▲ | guappa 7 days ago |
| What if you need to find 5 letter words to play wordle? Why do you care how many bytes they occupy or how large they are on screen? |
|
| ▲ | xigoi 7 days ago | parent | next [-] |
| In the case of Wordle, you know the exact set of letters you’re going to be using, which easily determines how to compute length. |
| |
| ▲ | guappa 7 days ago | parent [-] | | No no, I want to create tomorrow's puzzle. | | |
| ▲ | tomsmeding 7 days ago | parent [-] | | As the parent said: > In the case of Wordle, you know the exact set of letters you’re going to be using This holds for the generator side too. In fact, you have a fixed word list, and the fixed alphabet tells you what a "letter" is, and thus how to compute length. Because this concerns natural language, this will coincide with grapheme clusters, and with English Wordle, that will in turn correspond to byte length because it won't give you words with é (I think). In different languages the grapheme clusters might be larger than 1 byte (e.g. [1], where they're codepoints). |
|
|
|
| ▲ | taneq 7 days ago | parent | prev [-] |
| If you're playing at this level, you need to define: - letter - word - 5 :P |
| |
| ▲ | guappa 7 days ago | parent [-] | | Eh in macedonian they have some letters that in russian are just 2 separate letters | | |
| ▲ | CorrectHorseBat 7 days ago | parent | next [-] | | In German you have the same, only within one language. ß can be written as ss if it isn't available in a font, and only in 2017 they added a capital version. So depending the font and the unicode version the number of letters can differ. | | |
| ▲ | kbelder 6 days ago | parent | next [-] | | "Traditionally, ⟨ß⟩ did not have a capital form, and was capitalized as ⟨SS⟩. Some type designers introduced capitalized variants. In 2017, the Council for German Orthography officially adopted a capital form ⟨ẞ⟩ as an acceptable variant, ending a long debate." Thanks, that is interesting! | |
| ▲ | guappa 7 days ago | parent | prev [-] | | should "ß" == "ss" evaluate as true? | | |
| ▲ | birn559 7 days ago | parent [-] | | I don't see why it should. I also believe parent is wrong as there are unambiguous rules about when to use ß or ss. Never thought of it but maybe there are rules that allow to visually present the code point for ß as ss? At least (from experience as a user) there seem to be a singular "ss" codepoint. | | |
| ▲ | CorrectHorseBat 7 days ago | parent | next [-] | | >also believe parent is wrong as there are unambiguous rules about when to use ß or ss. I never said it was ambiguous, I said it depends on the unicode version and the font you are using. How is that wrong? (Seems like the capital of ß is still SS in the latest unicode but since ẞ is the preferred capital version now this should change in the future) | | |
| ▲ | birn559 7 days ago | parent | next [-] | | > How is that wrong?
Not sure where, how or if it's defined as part of Unicode, but so far I assumed that for a Unicode grapheme there exists a notion of what the visual representation should look like.
If Unicode still defines capital of ß as SS that's an error in Unicode due to slow adaption of the changes in the German language. | | |
| ▲ | weinzierl 6 days ago | parent | next [-] | | "ß as SS that's an error in Unicode" It's not. Uppercase of ß has always been SS. Before we had a separate codepoint in Unicode this caused problems with round-tripping between upper and lower case. So Unicode rightfully introduced a separate codepoint specifically for that use case in 2008. This inspired designers to design a glyph for that codepoint looking similar to ß. Nothing wrong with that. Some liked the idea and it got some foothold, so in 2017, the Council for German Orthography allowed it as an acceptable variant. Maybe it will win, maybe not, but for now in standard German the uppercase of ß is still SS and Unicode rightfully reflects that. | |
| ▲ | CorrectHorseBat 6 days ago | parent | prev [-] | | In unicode the default is still SS [1] while the Germans seem to have changed it to ẞ [2]. That means now it's the same on every system, but once the unicode standard changes and some systems get updated and others not there will be different behavior of len("ß".upper()) around. I don't know how or if systems deal with this, but ß should be printed as ss if ß is unavailable in the font. It's possible this is completely up to the user. [1] https://unicode.org/faq/casemap_charprop.html
[2] https://www.rechtschreibrat.com/DOX/RfdR_Amtliches-Regelwerk... | | |
| ▲ | weinzierl 6 days ago | parent [-] | | "In unicode the default is still SS [1] while the Germans seem to have changed it to ẞ [2]." Where does the source corroborate that claim? Can you give is a hint where to find the source? | | |
| ▲ | CorrectHorseBat 11 hours ago | parent [-] | | page 48:
> E3: Bei Schreibung mit Großbuchstaben ist neben der Verwendung des
Groß buchstabens ẞ auch die Schreibung SS möglich: Straße – STRAẞE –
STRASSE. While in older versions [1] it was the other way around: > E3: Bei Schreibung mit Großbuchstaben schreibt man SS. Daneben ist auch
die Verwendung des Großbuchstabens ẞ möglich. Beispiel: Straße –
STRASSE – STRAẞE. [1] https://www.rechtschreibrat.com/DOX/rfdr_Regeln_2016_redigie... |
|
|
| |
| ▲ | weinzierl 6 days ago | parent | prev [-] | | ẞ is not the preferred capital version, it is an acceptable variant (according to the Council for German Orthography). |
| |
| ▲ | guappa 7 days ago | parent | prev [-] | | well I don't speak german, I was asking | | |
| ▲ | birn559 7 days ago | parent [-] | | I see, wasn't clear to me on what level you were asking. The letter ß has never been generally equivalent to ss in the German language. From a user experience perspective though it might be beneficial to pretend that "ß" == "ss" holds when parsing user input. |
|
|
|
| |
| ▲ | int_19h 6 days ago | parent | prev | next [-] | | That's not really any different than the distinction (or lack thereof) between "ae" and "æ". For that matter, in Russian there is a letter "ы" which is historically a digraph consisting of two separately letters "ъ" and "i" that just happens to be treated as a single letter for so long that few people would even recognize it as a digraph. This kind of stuff is all language-specific, which is why for Worlde etc you always need to be aware of the context, and this context will then unambiguously decide what constitutes a single letter. | |
| ▲ | taneq 7 days ago | parent | prev [-] | | Niße. ;) |
|
|