Symbian3/SDK/Source/GUID-06EDE5E8-04EA-5A74-ADE2-E5B8C49AB292.dita
author Dominic Pinkman <dominic.pinkman@nokia.com>
Fri, 11 Jun 2010 12:39:03 +0100
changeset 8 ae94777fff8f
parent 7 51a74ef9ed63
permissions -rw-r--r--
Week 23 contribution of SDK documentation content. See release notes for details. Fixes bugs Bug 2714, Bug 462.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
7
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     1
<?xml version="1.0" encoding="utf-8"?>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     2
<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     3
<!-- This component and the accompanying materials are made available under the terms of the License 
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     4
"Eclipse Public License v1.0" which accompanies this distribution, 
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     5
and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     6
<!-- Initial Contributors:
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     7
    Nokia Corporation - initial contribution.
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     8
Contributors: 
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
     9
-->
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    10
<!DOCTYPE concept
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    11
  PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    12
<concept id="GUID-06EDE5E8-04EA-5A74-ADE2-E5B8C49AB292" xml:lang="en"><title>Character
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    13
Conversion (Charconv) Framework Concepts</title><prolog><metadata><keywords/></metadata></prolog><conbody>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    14
<p>This section describes the terminology used often in character conversions,
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    15
such as BMP and Charconv converters. </p>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    16
<section id="GUID-F964FD3C-D80B-4DBB-A99D-71CC60C362FC"><title>Character sets </title> <p>Textual data in electronic devices
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    17
is stored in terms of a character set. A character set is a group of characters,
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    18
each of which is encoded as a different number. The appearance of each character
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    19
is not a property of the character set, but rather of the font. So a character
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    20
may be rendered using many different glyphs, but will always have the same
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    21
numeric value within its character set. Other properties which can also be
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    22
included in a character set’s definition are the direction of writing, and
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    23
the way in which sets of characters are combined. </p> </section>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    24
<section id="GUID-58021C48-1A3D-41C8-8B82-16C0481BFDCB"><title>Unicode, UCS and UCS-2</title> <p>Character sets, and the
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    25
ways of encoding them, have proliferated with the increasing acceptance of
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    26
computers and communicators throughout the world. This has led to an international
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    27
standard character set, which encompasses all commonly used character sets,
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    28
including Eastern ideograms, in a single character set, Unicode, defined by
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    29
the Unicode Consortium (http://www.unicode.org). </p> <p>UCS is the name for
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    30
Unicode Character Set. Unicode characters are generally encoded using one
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    31
16-bit value but written to files in two bytes. This is referred to as UCS-2
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    32
encoding formats. There are also other Unicode encoding formats such as UTF-16
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    33
and UTF-8 for different purposes. For the full definition of these formats,
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    34
see The Unicode Standard published by the Unicode Consortium. </p> </section>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    35
<section id="GUID-24F61FEA-C3FE-4CBB-BDA2-4FF741288B63"><title>BMP</title> <p>Unicode points between U+0000 to U+FFFF are
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    36
called Basic Multilingual Plane (BMP). BMP covers almost all characters in
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    37
different languages. Code points outside the BMP must be encoded using a "surrogate
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    38
pair", which consists of two 16-bit values. The Symbian platform
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    39
currently does not support scripts with characters mapped to code points above
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    40
U+FFFF. Code points above U+FFFF are also known as supplementary characters. </p> </section>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    41
<section id="GUID-21DF5FEF-2446-4D23-8139-869A0CD7B514"><title> UTF-16</title> <p>UTF-16 is one of the Unicode encoding formats.
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    42
It supports characters within and outside BMP using a number of 16-bit characters. </p> <p>In
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    43
the text-processing subsystem, the Symbian platform uses UTF-16 Unicode format.
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    44
This means that any input to the text-processing subsystem must be in UTF-16.
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    45
Different character converters can be used to convert text from other encoding
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    46
formats to UTF-16. </p> </section>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    47
<section id="GUID-786FEE95-D7A5-4E41-AB41-C8D54BFB8C54"><title>Transformation formats </title> <p>The UCS-2 format of the
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    48
Unicode character set encodes each character as 2 bytes (16 bits total). However
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    49
it does not specify which of the bytes is most significant. The byte order,
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    50
or endian-ness, is left up to the discretion of a particular operating system. </p> <p>While
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    51
this is not important within a system, it does mean that text encoded as UCS-2
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    52
cannot easily be shared between systems using a different endian-ness. To
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    53
overcome this problem the Unicode Consortium has defined two transformation
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    54
formats for sharing Unicode text. The transformation formats explicitly specify
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    55
byte order, and cannot be misinterpreted by computers using a different byte
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    56
order. </p> <p>The two transformation formats, UTF-7 and UTF-8, are described
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    57
below. For the full definition of these formats, see The Unicode Standard
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    58
published by the Unicode Consortium. </p> <p> <b>UTF-7</b>  </p> <p>UTF-7
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    59
allows Unicode characters to be encoded and transmitted as 8-bit bytes, of
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    60
which only 7 bits are used. UTF-7 divides the set of Unicode characters into
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    61
three subsets, which are encoded and transmitted differently. </p> <ul>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    62
<li id="GUID-8E1A1C8B-8234-57C3-93D4-5A0A4E8C1374"><p>Set D, is the set of
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    63
characters which are encoded as a single byte. It includes lower and upper
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    64
case A to Z, the numeric digits, and nine other characters. </p> </li>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    65
<li id="GUID-3E19560B-4087-575E-A091-64FCFD24C811"><p>Set O includes the characters <b>!
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    66
" # $ % &amp; * ; &lt; = &gt; @ [ ] ^ _ </b> <b>{</b> <b> | </b> <b>}</b>. These
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    67
characters can be encoded as a single byte, or with the modified <keyword>base
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    68
                64</keyword>  encoding used for set B characters. When
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    69
encoded as a single byte, set O characters can be misinterpreted by some applications —
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    70
encoding as modified base 64 overcomes this problem. </p> </li>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    71
<li id="GUID-01E822AE-71CC-5F0B-BC60-53F914600A5E"><p>Set B comprises the
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    72
remaining characters, which are encoded as an <keyword>escape byte</keyword> followed
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    73
by 2 or 3 bytes. The encoding format is a modified form of base 64 encoding. </p> </li>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    74
</ul> <p> <b>UTF-8</b>  </p> <p>UTF-8 encodes and transmits Unicode characters
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    75
as a string of 8-bit bytes. All the ASCII characters 0 to 127 are encoded
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    76
without change; the most significant bit being set to zero is a signal that
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    77
they have not been changed. Unicode characters U0080 to U07FF are encoded
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    78
in two bytes, the remaining Unicode characters — except for the surrogates —
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    79
are encoded in three bytes. The Unicode surrogate characters are supported
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    80
by the Character Conversion API, but are not currently supported by all Symbian
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    81
platform components. </p> <p>A variant of UTF-8 used internally by Java differs
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    82
from standard UTF-8 in two ways. First, the specific case of the NULL character
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    83
(0x0000) is encoded in the two-byte format, and second, only the one-, two-
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    84
and three-byte formats are used, not the four-byte format which is normally
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    85
used for Unicode surrogate-pairs. An argument to <codeph>ConvertFromUnicodeToUtf8</codeph> controls
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    86
whether the UTF-8 generated by this is the Java variant. Support for this
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    87
was removed in v6.0. </p> </section>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    88
<section id="GUID-5E593C5A-882B-5B11-AD6E-CFD10EA6700B"><title>Charconv converter</title> <p>Each
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    89
converter implements a conversion between a single foreign character encoding
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    90
and UTF-16, and is identified by a Unique Identifier (UID). The Symbian platform
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    91
provides the following two types of converter: </p> <ul>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    92
<li id="GUID-8A1AFA7F-E330-5309-9ED7-26A4A46411CB"><p>built into the Framework
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    93
component and used by most languages </p> </li>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    94
<li id="GUID-E7A3FD30-37E0-5B01-B2BE-7DE313045D9F"><p>implemented as Ecom
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    95
plug-ins in the Plug-ins component and used by certain languages. </p> </li>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    96
</ul> </section>
51a74ef9ed63 Week 12 contribution of API Specs and fix SDK submission
Dominic Pinkman <Dominic.Pinkman@Nokia.com>
parents:
diff changeset
    97
</conbody></concept>