Philip Newton (pne) wrote,
Philip Newton

  • Mood:

FUD about Unicode and Urdu

This Language Log entry pointed me to this article about Urdu calligraphy, and how it's still being used for practically all "printed" works (rather than block type or computerised DTP).

It was an interesting read, but I was a little annoyed about reading what I considered to be FUD about Unicode.

The article points out that Any combination of two or more successive characters will require a particular shaped ligature and proportion - many more than are available in Unicode’s Arabic Extended section, which is true.

However, it goes on to claim that There’s an upper limit to the performance Unicode can give us for nasta`liq (in the number of available glyphs) and I feel that that is emphatically not true.

After all, Unicode primarily deals in characters, not glyphs. Even if there were no pre-shaped Arabic characters (i.e. specifically initial, medial, final, or isolated shapes), let alone ligatures, Unicode doesn't prevent you from typesetting beautiful Arabic any more than it prevents you from typesetting beautiful Devanagari with all sorts of complex ligatures.

It may be very difficult to make a font containing all the necessary ligatures for high-quality nasta`liq, and very difficult to make a high-quality renderer that will take Unicode code points and choose the appropriate "particular shaped ligature and proportion" (even the height above baseline will depend on what has come before, for example) -- but I feel that this is not a shortcoming of Unicode.

I think that it's not hard to agree that whether a text is calligraphed in a naskh-like style or a nasta`liq-like style, it'll be made up of individual letters, even if a given abstract letter may be represented by many different concrete shapes depending on context; regardless of how, say, the letter meem will look in a specific position in a specific word, there'll be a bit of ink that is underlyingly, semantically, a "meem", and as I understand it, Unicode is at rights at encoding a meem by a single codepoint no matter where it occurs. (As a weak analogy: just as you don't have a separate Unicode glyph for an "AV" combination that makes the V move over a little to the left to fit in more pleasingly with the right side of the A: kerning is not part of Unicode.)

Now if the author of the text had lamented the dearth of high-quality Unicode rendering and fonts, that would be fine -- but placing the blame on Unicode itself is not on. I don't know of any reason why the number of glyphs in a font would have to be constrained by the number of Unicode code points available (and indeed, I've seen ligatures in computer Devanagari that I know are not encoded as single code points in Unicode), though I'm not an expert in that sort of thing.

So by all means, use the few dozen basic characters in the Unicode "Arabic" block, and then use a good rendering engine couple with a large font to make pretty nasta`liq. If you don't have a good rendering engine, blame Microsoft or Apple or whoever supplies your OS's renderer; if you don't have a good font, blame font houses. But don't complain that Unicode inherently limits the performance you can get with nasta`liq.

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded