La lingvo internacia kaj vi

I have of course been poking at the Kindle with a proverbial stick ever since I got it. I finally got around to testing the claim that it supports Unicode by loading it with pretty much every language I read.

Japanese does work. The Kindle will display unadorned shift-JIS in the home page listings, although books are easiest to read if you convert to PDF first, so you can keep the customary vertical columns and Ruby-coded furigana. There are also converters available for taking folders full of scanned manga and converting them to clear PDFs sized for the Kindle screen. The display is sharp enough that the furigana in speech bubbles doesn't get any more lost than it does in print, and since most manga are printed in black and white plus screentones, the pictures are easy to render. If it handles shift-JIS, then it probably also handles traditional and simplified Chinese, which shares part of the character pool, although I don't have any e-materials in Chinese to check it with. I'll have to go back and look for the non-tedious bits of Genji Monogatari -- hilarious episodes do exist, mixed in with the rubbish soap opera -- and load them on.

All of the European languages (French, Spanish, German) display fine. Most things that claim to support Unicode are good with these, although sometimes I run into something that chokes on, or simply omits to print, the accented characters.

This is not surprising. A lot of things that "support" Unicode sort of cheat. One of the great strengths of Unicode is that it contains what are called "combining accents". The idea is that you type a letter, then a combining accent, and the code on the combining accent says "step back one space and overstrike this on the character you printed there". Letters like à, é, and ç are usually present in extended character sets as single characters, since they're used extensively in at least one very common language, and sometimes in English loanwords; many things that "support" Unicode look at vowel + combining accent and go 'oh, hey, we have that thing, no need to fuss with combining' and just swap in corresponding single character. It works well for a lot of western Europe, because for the most part the Romance and Germanic languages pull from a common pool of diacritical marks.

The full Electric Kool-Aid Acid Test for Unicode, however, is Esperanto. There are multiple languages that use circumflexes over vowels and carons over consonants, and I may be missing one or two that use carons over vowels, but no other language I know uses circumflexes over consonants. (Circumflected c, s, h, g and j appear in Esperanto, equivalent to the underlined sounds in church, shine, loch, gypsum and bonjour, respectively. Either a circumflex or a caron over a u is used to indicate its status as a semi-consonant in diphthongs like 'au'. There are makeshift ways to write these if only standard ASCII is available, but they are all ugly kludges.) In order to print these, the Unicode combining accents must work correctly, because circumflected consonants appear in no standard extended character set, and there's no other way to make them appear.

I am happy to report that the Kindle deals with Unicode fairly, and la ortografio korekta is displayed when reading Esperanto. (Note to other linguists: If you speak a Romance language first, beginning Esperanto will drive you bats in very short order. The only definite article is la and all singular nouns end in -o. Words ending in -a are adjectives. Have fun with that.) Project Gutenberg has a small selection of works written or translated by esperantists; some are poetry, which gets very interesting very fast in Esperanto, because Esperanto is a stress-timed language, and by grammatical law the stress always falls on the penultimate vowel. Most authors deal with it by amputating endings where necessary to fit things into iambic or dactylic patterns -- because the language is perfectly regular, it's easy to reconstruct the inflections and therefore the meaning, but it's daunting at first glance.

Comments