author | timkelly |
Thu, 10 Dec 2009 13:45:47 -0600 | |
branch | RCL_2_4 |
changeset 671 | 80524b72f957 |
parent 0 | fb279309251b |
child 1641 | 2b3996fc09a1 |
permissions | -rw-r--r-- |
0
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
1 |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"><html> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
2 |
<head> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
3 |
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
4 |
<meta http-equiv="Content-Style-Type" content="text/css" /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
5 |
<meta name="LASTUPDATED" content="06/17/05 11:09:43" /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
6 |
<title>Multibyte and Unicode Support</title> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
7 |
<link rel="StyleSheet" href="../../book.css" type="text/css"/> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
8 |
</head> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
9 |
<body bgcolor="#FFFFFF"> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
10 |
<h3>Multibyte and Unicode Support</h3> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
11 |
<p> The Carbide preprocessor fully supports the use of multibyte and Unicode-formatted source files.</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
12 |
<p>Source and header files may be encoded in any text encoding the operating system recognizes. (Unicode text decoding support is implemented using native OS services in Win32 and <span class="code">conv()</span> on UNIX systems.)</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
13 |
<p>As per ISO C++98 and C99, universal characters may be used in any string or character literal, identifiers (macros, variables, functions, methods, etc.), and comments. These characters are derived either from multibyte sequences in the source text, by virtue of the source file being encoded in UCS-2 or UCS-4, or by use of the <span class="code">\uXXXX</span> or <span class="code">\UXXXXXXXX</span> escape sequences.</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
14 |
<h4>Wide String Literals</h4> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
15 |
<p> Wide string literals in the form L"xxx" and wide characters in the form L’xxx’ are interpreted in the context of the source text encoding.</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
16 |
<p class="listing">const wchar_t *str = L"Meine Katze ißt Mäuse nachts.";</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
17 |
<p>Multibyte character sequences that appear in strings are internally converted to their Unicode equivalents until the C/C++ token for the string is generated. At that time, if the string literal is a narrow string (i.e. using char), the original multibyte character sequence are emitted. If the string is a wide string (using <span class="code">wchar_t</span>), the Unicode characters translated from the multibyte sequence are emitted. If <span class="code">wchar_t</span> is 16 bits and a character is truncated to 16 bits, a warning is reported. (See ISO C99, §6.4.5.5 for the specification of this behavior.)</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
18 |
<p>In the event that you want translated – and usually truncated – Unicode characters in narrow string literals, enable <span class="code">#pragma multibyteaware_preserve_ literals off</span>.</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
19 |
<h4>Source Encoding</h4> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
20 |
<p>The compiler uses the <a href="../pragmas/p_text_encoding.htm">Source encoding</a> option in to control how it detects the source file encoding. The compiler recognizes the following source encoding options:</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
21 |
<ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
22 |
<li>ASCII–no detection is done, the high-ASCII characters are not interpreted, and wide character strings are merely zero-extensions of the individual bytes in the string. </li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
23 |
<li>Autodetect–the compiler attempts to tell whether byte-encoded source is encoded in UTF-8, Shift-JIS, EUC-JP, ISO-2022-JP, or whichever encoding format the operating system considers the default. This option degrades compile speeds slightly due to the extra scanning. </li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
24 |
<li>System–the compiler uses the system locale without scanning the source.</li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
25 |
<li>UTF-8, Shift-JIS, EUC-JP, ISO-2022-JP– self-explanatory. </li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
26 |
</ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
27 |
<p class="note"><strong>NOTE</strong> Currently, the compiler ignores the mapping in some Japanese character sets of <span class="code">0x5c</span> (ASCII ‘<span class="code">\</span>’) to Yen (<span class="code">\u00A5</span>) because the C++98 and C99 standards say that the ASCII character set must be mapped one-to-one with any multibyte encoding. <span class="code">0x5C</span> is always interpreted as ‘<span class="code">\</span>’ (except inside multibyte character sequences).</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
28 |
<p class="note"><strong>NOTE</strong> The ISO-2022-JP and EUC-JP encoding currently only recognize characters defined by JIS X 0208-1990 (i.e., the escapes ESC $ @, ESC $ B for ISO-2022-JP and two-byte sequences in EUC-JP). The additional characters in JIS X 0212-1990 are not yet recognized. </p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
29 |
<p>No matter what the Source encoding setting, the compiler always detects <span class="code">UTF-16{BE,LE}</span> or <span class="code">UCS-4{BE,LE}</span> source through a statistical character scan for NULs. </p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
30 |
<p class="note"><strong>NOTE</strong> Currently, only the command-line tools, not the IDE, can properly handle sources in Unicode format (UTF-16, UCS-2, UCS-4).</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
31 |
<p>Alternately, individual source files can identify which source text encoding is present using #pragma text_encoding. The format is:<br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
32 |
#pragma text_encoding("name" | unknown | reset [, global])</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
33 |
<p>Where name is an IANA or MIME encoding name or an OS-specific string. </p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
34 |
<p>The compiler recognizes these names and maps them to its internal decoders:</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
35 |
<p class="listing">system US-ASCII ASCII ANSI_X3.4-1968 ANSI_X3.4-1968<br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
36 |
ANSI_X3.4 UTF-8 UTF8 ISO-2022-JP CSISO2022JP ISO2022JP<br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
37 |
CSSHIFTJIS SHIFT-JIS SJIS EUC-JP EUCJP <br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
38 |
UCS-2 UCS-2BE UCS-2LE UCS2 UCS2BE UCS2LE UTF-16 U<br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
39 |
TF-16BE UTF-16LE UTF16 UTF16BE UTF16LE UCS-4 UCS-4BE <br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
40 |
UCS-4LE UCS4 UCS4BE UCS4LE 10646-1:1993 ISO-10646-1 <br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
41 |
ISO-10646 unicode</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
42 |
<p class="note"><strong>NOTE</strong> UCS-2 is always interpreted as UTF-16; i.e. surrogate character pairs are used to select characters through plane 16.</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
43 |
<p>This #pragma may be used several times in one file (probably unlikely usage). The compiler expects the pragma to be encoded in the “current” text encoding, through the end of line. </p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
44 |
<p>By default, #pragma text_encoding is only effective through the end of file. To affect the default text encoding assumed for the current and all subsequent files, supply the “global” modifier.<br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
45 |
</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
46 |
<h4>Other Support</h4> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
47 |
<p>In addition, note the following:</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
48 |
<ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
49 |
<li>Specify Universal character names (i.e. Unicode code points) with the<span class="code"> \uXXXX</span> or <span class="code">\UXXXXXXXX</span> escape sequences:</li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
50 |
</ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
51 |
<blockquote> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
52 |
<p class="listing">wchar_t florette = L’\u273f’;<br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
53 |
int \u30AD\u30B3\u30DE\u30A6 = soy_sauce();</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
54 |
</blockquote> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
55 |
<ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
56 |
<li>You can use \uXXXX or \UXXXXXXXX regardless of the actual size of <span class="code">wchar_t</span> (use of the escape does not impose a character width on the literal).</li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
57 |
<li>Preprocessed text is always emitted in ASCII. Wide characters are emitted in the <span class="code">\uXXXX</span> or <span class="code">\UXXXXXXXX</span> format.</li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
58 |
</ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
59 |
<blockquote> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
60 |
<p class="listing"> extern string *Wörter[]; <br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
61 |
--><br /> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
62 |
extern string *W\u00F6rter[];</p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
63 |
</blockquote> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
64 |
<ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
65 |
<li>Identifiers using characters not representable in ASCII are emitted in object code with the <span class="code">\uXXXX</span> or <span class="code">\UXXXXXXXX</span> escape sequences. If an object format does not support the ‘<span class="code">\\</span>’ character, the encoding may be modified on a target-specific basis.</li> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
66 |
</ul> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
67 |
<p>Also see “Character Constants as Integer Values” for information on creating a character constant consisting of more than one character (not to be confused with this topic). </p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
68 |
<p>For more information on Unicode, see <a href="http://www.unicode.org">www.unicode.org</a>. </p> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
69 |
<div id="footer">Copyright © 2009 Nokia Corporation and/or its subsidiary(-ies). All rights reserved. <br>License: <a href="http://www.eclipse.org/legal/epl-v10.html">http://www.eclipse.org/legal/epl-v10.html</a></div> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
70 |
|
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
71 |
|
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
72 |
</body> |
fb279309251b
DP tools release version Revision: 200912
Deepak Modgil <Deepak.Modgil@Nokia.com>
parents:
diff
changeset
|
73 |
</html> |