Remix.run Logo
radpanda 4 days ago

> There are, in fact, 88 approved Icelandic names with this exact pattern of declension, and they all end with “dur”, “tur” or “ður”.

> But that quickly breaks down. There are other names ending with “ður” or “dur” that follow a different pattern of declension

My “everything should be completely orderly” comp-sci brain is always triggered by these almost trivial problems that end up being much more interesting.

Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix? If one wanted to improve upon your work for unknown names, rather than consider the letters used, would you have to do some NLP on the name to get a representation of the pronunciation and look that up (in a trie or otherwise)?

dmit 4 days ago | parent | next [-]

> Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix?

Careful, this is how you fall down the Are Dependent Types The Answer?? hole.

perching_aix 4 days ago | parent [-]

Not sure what that's supposed to mean, but if Icelandic is anything like my native language in this, then it is indeed a pronunciation based thing. Which should make sense, since languages are (historically) spoken first, written second.

dmit 4 days ago | parent [-]

Heheh, it was mostly a reference to my [and mostly others'!] experiments with encoding human languages in a programming language. There are some pretty neat ideas there to explore, like the difference between Subject-Object-Verb (SOV) and Object-Subject-Verb. Or postfix languages (e.g. Forth) mapping to some human languages.

In this particular example, having a subsequent part of an expression rely on prior parts would usually be accomplished at runtime in most languages. But some (like Idris) might allow you to encode the rules in the type system. Thus the rabbit hole.

perching_aix 4 days ago | parent [-]

Ah okay. That's a journey I'm currently also preparing to embark on, though from the other direction: I'm trying to generate "natural" language from program code. I already know it's pretty hopeless, but increasingly I feel like it's not really a choice anyhow, so I may as well finally have a go at it. Let's see :)

dmit 4 days ago | parent [-]

Godspeed!

alexharri 4 days ago | parent | prev [-]

Hmm, good idea. There are names that have the exact same pronunciation yet have different patterns of declension, for example:

- Ástvaldur -> ur,,i,ar - Baldur -> ur,ur,ri,urs

The "aldur" ending is pronounced in the exact same manner, but applying the declension pattern of "Ástvaldur" to "Baldur" would yield:

- Baldur - Bald - Baldi - Baldar

The three last forms feel very wrong (I asked my partner to verify and she cringed).

Spoken Icelandic is surprisingly close to its written form. I wouldn't expect very different results for the trie if a "phonetic" version of names and their endings were used instead of their written forms