▲ | miki123211 6 days ago | |
This really depends on the language. In some languages, pronunciation(a+b) == pronunciation(a) + pronunciation(b). Polish mostly belongs to this category, for example. For these, it's enough to go token-by-token. For English, it is not that simple, as e.g. the "uni" in "university" sounds completely different to the "uni" in "uninteresting." In English, even going word-by-word isn't enough, as words like "read" or "live" have multiple pronunciations, and speech synthesizers rely on the surrounding context to choose which one to use. This means you probably need to go by sentence. Then you have the problem of what to do with code, tables, headings etc. While screen readers can announce roles as you navigate text, they cannot do so when announcing the contents of the live region, so if that's something you want, you'de need to build a micro screen-reader of sorts. |