|
1 To generate or modify mapping headers |
|
2 ------------------------------------- |
|
3 Mapping headers are imported from CJKCodecs as pre-generated form. |
|
4 If you need to tweak or add something on it, please look at tools/ |
|
5 subdirectory of CJKCodecs' distribution. |
|
6 |
|
7 |
|
8 |
|
9 Notes on implmentation characteristics of each codecs |
|
10 ----------------------------------------------------- |
|
11 |
|
12 1) Big5 codec |
|
13 |
|
14 The big5 codec maps the following characters as cp950 does rather |
|
15 than conforming Unicode.org's that maps to 0xFFFD. |
|
16 |
|
17 BIG5 Unicode Description |
|
18 |
|
19 0xA15A 0x2574 SPACING UNDERSCORE |
|
20 0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE |
|
21 0xA1C5 0x02CD SPACING HEAVY UNDERSCORE |
|
22 0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT |
|
23 0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT |
|
24 0xA2CC 0x5341 HANGZHOU NUMERAL TEN |
|
25 0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY |
|
26 |
|
27 Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another |
|
28 big5 codes already, a roundtrip compatibility is not guaranteed for |
|
29 them. |
|
30 |
|
31 |
|
32 2) cp932 codec |
|
33 |
|
34 To conform to Windows's real mapping, cp932 codec maps the following |
|
35 codepoints in addition of the official cp932 mapping. |
|
36 |
|
37 CP932 Unicode Description |
|
38 |
|
39 0x80 0x80 UNDEFINED |
|
40 0xA0 0xF8F0 UNDEFINED |
|
41 0xFD 0xF8F1 UNDEFINED |
|
42 0xFE 0xF8F2 UNDEFINED |
|
43 0xFF 0xF8F3 UNDEFINED |
|
44 |
|
45 |
|
46 3) euc-jisx0213 codec |
|
47 |
|
48 The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into |
|
49 unicode U+FF3C instead of U+005C as on unicode.org's mapping. |
|
50 Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140 |
|
51 is shown as a full width character, mapping to U+FF3C can make |
|
52 more sense. |
|
53 |
|
54 The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on |
|
55 codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have |
|
56 overlapped by each other, it doesn't bother standard conformations |
|
57 (and JIS X 0213 Plane 2 is intended to use so.) On encoding |
|
58 sessions, the codec will try to encode kanji characters in this |
|
59 order: |
|
60 |
|
61 JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212 |
|
62 |
|
63 |
|
64 4) euc-jp codec |
|
65 |
|
66 The euc-jp codec is a compatibility instance on these points: |
|
67 - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa) |
|
68 - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way) |
|
69 - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way) |
|
70 |
|
71 |
|
72 5) shift-jis codec |
|
73 |
|
74 The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly |
|
75 instead of using JIS X 0201 for compatibility. The differences are: |
|
76 - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c. |
|
77 - U+007E TILDE is mapped to SHIFT-JIS 0x7e. |
|
78 - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f. |
|
79 |