New i18n code
Lendvai Attila
Attila.Lendvai at netvisor.hu
Fri Feb 11 02:42:15 PST 2005
Yeah, great, thanks for this! :)
I'll take a look and make the windoze console code use utf16...
- 101
:: Excellent! I'll incorporate this into CVS shortly. I'll also add a
:: README with your notes so that this usage information and
:: guide is easy
:: to track.
::
:: On Feb 10, 2005, at 2:52 PM, Olli Pietiläinen wrote:
::
:: > Hi.
:: >
:: > My recent work on the i18n code can be found at
:: > http://ollip.freeshell.org/slate-i18n.tar.bz2. I don't
:: want to flood
:: > everyones mailboxes by attaching it here.
:: >
:: > Much has changed from the previous version, the biggest difference
:: > being
:: > changed character data handling. Now character data is parsed from
:: > UnicodeData.txt and stored in a "two-phase-table" (a kind
:: of trie).
:: > That
:: > means it's a table of tables. The first table is indexed with bits
:: > over 7 of
:: > the code point, and the resulting table is indexed by the lowest 7
:: > bits.
:: > Empty blocks all point to a shared empty block, and all blocks with
:: > duplicate data point to shared blocks too. Also, every
:: duplicate item
:: > in the
:: > blocks (the properties for single characters) is also shared. This
:: > reduces
:: > the image size from over 8Mb with flat table to only 4Mb.
:: I think this
:: > can
:: > still be tweaked to take even less memory, with no impact
:: on access
:: > speed.
:: >
:: > This should be much faster than the old code-based hack. At least
:: > access
:: > time is constant, and maintenance is much easier. This has the
:: > drawback of
:: > growing image size, but that's still not much I think. If smaller
:: > image size
:: > is wanted, only the needed parts of the data can be used.
:: That's easy
:: > with
:: > the two-phase-table: just make the unneeded blocks to
:: point to a shared
:: > empty block.
:: >
:: > New additions are normalization to all four normalization
:: forms, which
:: > makes
:: > strings that look the same to the user also look the same to the
:: > system and
:: > is required by many operations like sorting, and UTF-16 (including
:: > UTF-16BE
:: > and UTF-16LE) encoding/decoding. There are also lots of
:: small fixes and
:: > minor enhancements.
:: >
:: > utils.slate includes stuff that I think should be elsewhere. Take a
:: > look at
:: > them and put them where you think is their right place, or
:: leave them
:: > there.
:: > splitPreservingEmptys: should probably be incorporated
:: with splitWith:
:: > in
:: > sequence.slate. splitWith: has the keyword &includeEmpty: which
:: > currently
:: > doesn't do anything. I don't know where Int16(Read|Write)Stream
:: > LittleEndian/BigEndian might belong, or if it should be named
:: > differently.
:: >
:: > layout-builder.slate is bit of a hack, but it's not supposed to be
:: > used by
:: > users. It's used to generate the cross-link data for the
:: tables, and
:: > that
:: > should change only when the implementation is changed or Unicode
:: > consortium
:: > releases a new version of the standard. The cross-links
:: are stored in
:: > Links1.data and Links2.data, which are read by the table building
:: > routines.
:: >
:: > Also, there's no mappings.slate anymore, its functionality
:: is handled
:: > by
:: > properties.slate.
:: >
:: > Usage is simple: load 'src/i18n/init.slate' and run
:: buildUnicodeTable.
:: > Most
:: > of the things that can be done with the old strings can be
:: done with
:: > UnicodeStrings too, although I haven't checked most of the old
:: > functionality
:: > after my changes.
:: >
:: > Olli
::
:: --
:: Brian T. Rice
:: LOGOS Research and Development
:: http://tunes.org/~water/
::
::
More information about the Slate
mailing list