Unicode

Paul Prescod papresco@calum.csclub.uwaterloo.ca
Fri, 09 May 1997 17:12:20 -0400


Mike Wilson wrote:
> 
> (I have an interest in this area, but neither experience nor a large
> knowledge base, so grain-of-salt time.)
> 
> How about giving text objects encoding-system and language properties?
> 
> Text manipulation functions should be able to look at the encoding-system
> property and do the right thing without needing a `standard' encoding
> system and without effort by the high-level programmer.

Note that there is a difference between a character set: (a mapping from
character numbers to names/glyphs) and a character ENCODING. The UTF 8
encoding is very efficient for ASCII text, but still uses the Unicode
character SET. So space efficiency is not really an argument against
Unicode.
 
See also: draft-ietf-html-charset-harmful-00.txt

> What about linguists who deal with languages not covered by unicode,
> i.e. ancient japanese, ancient egyptian, Klingon, etc.
> They'll need to use non-standard unicode, which counts to me
> as a unique encoding system.

There are private character locations allocated for non-standard uses.
But I can see your point that an extended Unicode cannot be reliably
transmitted without additional information.
 
 Paul Prescod