symbian-qemu-0.9.1-12/python-2.6.1/Modules/cjkcodecs/README
changeset 1 2fb8b9db1c86
equal deleted inserted replaced
0:ffa851df0825 1:2fb8b9db1c86
       
     1 To generate or modify mapping headers
       
     2 -------------------------------------
       
     3 Mapping headers are imported from CJKCodecs as pre-generated form.
       
     4 If you need to tweak or add something on it, please look at tools/
       
     5 subdirectory of CJKCodecs' distribution.
       
     6 
       
     7 
       
     8 
       
     9 Notes on implmentation characteristics of each codecs
       
    10 -----------------------------------------------------
       
    11 
       
    12 1) Big5 codec
       
    13 
       
    14   The big5 codec maps the following characters as cp950 does rather
       
    15   than conforming Unicode.org's that maps to 0xFFFD.
       
    16 
       
    17     BIG5        Unicode     Description
       
    18 
       
    19     0xA15A      0x2574      SPACING UNDERSCORE
       
    20     0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
       
    21     0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
       
    22     0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
       
    23     0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
       
    24     0xA2CC      0x5341      HANGZHOU NUMERAL TEN
       
    25     0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY
       
    26 
       
    27   Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
       
    28   big5 codes already, a roundtrip compatibility is not guaranteed for
       
    29   them.
       
    30 
       
    31 
       
    32 2) cp932 codec
       
    33 
       
    34   To conform to Windows's real mapping, cp932 codec maps the following
       
    35   codepoints in addition of the official cp932 mapping.
       
    36 
       
    37     CP932     Unicode     Description
       
    38 
       
    39     0x80      0x80        UNDEFINED
       
    40     0xA0      0xF8F0      UNDEFINED
       
    41     0xFD      0xF8F1      UNDEFINED
       
    42     0xFE      0xF8F2      UNDEFINED
       
    43     0xFF      0xF8F3      UNDEFINED
       
    44 
       
    45 
       
    46 3) euc-jisx0213 codec
       
    47 
       
    48   The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
       
    49   unicode U+FF3C instead of U+005C as on unicode.org's mapping.
       
    50   Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
       
    51   is shown as a full width character, mapping to U+FF3C can make
       
    52   more sense.
       
    53 
       
    54   The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
       
    55   codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
       
    56   overlapped by each other, it doesn't bother standard conformations
       
    57   (and JIS X 0213 Plane 2 is intended to use so.) On encoding
       
    58   sessions, the codec will try to encode kanji characters in this
       
    59   order:
       
    60 
       
    61     JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
       
    62 
       
    63 
       
    64 4) euc-jp codec
       
    65 
       
    66   The euc-jp codec is a compatibility instance on these points:
       
    67    - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
       
    68    - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
       
    69    - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
       
    70 
       
    71 
       
    72 5) shift-jis codec
       
    73 
       
    74   The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
       
    75   instead of using JIS X 0201 for compatibility. The differences are:
       
    76    - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
       
    77    - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
       
    78    - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
       
    79