Symbian3/SDK/Source/GUID-FE94596E-B5BB-51FE-BE38-069840323915.dita
changeset 7 51a74ef9ed63
child 8 ae94777fff8f
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Symbian3/SDK/Source/GUID-FE94596E-B5BB-51FE-BE38-069840323915.dita	Wed Mar 31 11:11:55 2010 +0100
@@ -0,0 +1,191 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
+<!-- This component and the accompanying materials are made available under the terms of the License 
+"Eclipse Public License v1.0" which accompanies this distribution, 
+and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
+<!-- Initial Contributors:
+    Nokia Corporation - initial contribution.
+Contributors: 
+-->
+<!DOCTYPE concept
+  PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="GUID-FE94596E-B5BB-51FE-BE38-069840323915" xml:lang="en"><title>Encoding
+Types</title><prolog><metadata><keywords/></metadata></prolog><conbody>
+<p>This topic describes the types of SMS encoding. </p>
+<section id="GUID-F7D1E6C8-9605-57FA-9788-AF7FC72BD94C"><title>7-bit GSM encoding</title> <p>7-bit
+GSM encoding supports the GSM 7-bit default alphabet and GSM 7-bit default
+alphabet extension table through an escape mechanism. </p> <p>Figure 1 </p> <fig id="GUID-CDEE59FC-F035-5B75-8838-96E94A6714E8">
+<title>              Escape mechanism            </title>
+<image href="GUID-08A6B93F-92CD-5182-B142-D353E78016F3_d0e406761_href.png" placement="inline"/>
+</fig> <p>The GSM 7-bit default alphabet consists of 128 characters. Each
+character is represented by 7 bits. 10 extra characters are defined in the
+GSM 7-bit default extension table. These characters are represented by an
+escape mechanism using the escape character (0x1B). For example, 0x1B65 maps
+to the Euro sign € (U+20AC). If an escape character byte is followed by a
+character that is not included in the 10 characters, the escape character
+is just ignored. This means 0x1B41 maps to Latin capital letter A (U+0041). </p> <p>For
+more information about the GSM 7-bit default table, extension table and escape
+mechanism, see 3GPP TS 23.038 V8.1.0. </p> </section>
+<section id="GUID-918FF2E3-B9F4-5C61-8DBA-F9143DB16460"><title>Lossy 7-bit
+encoding</title> <p>Lossy 7-bit encoding enlarges the character set supported
+by 7-bit GSM encoding. Some Unicode Characters do not exist in the target
+7-bit set. These characters are converted to ones that do exist in the target
+7-bit set and closely resemble the original, intended character. A lossy encoding
+using a 7-bit encoding is more cost effective than a UCS-2 encoding. </p> <p> <b>Example
+of 7-bit encoding</b>  </p> <p>Accented Latin characters are not supported
+by 7-bit GSM encoding. Figure 2 describes how an accented Latin characters
+Á, is sent by SMS. Á has a Unicode value of 0x00C1. When it is processed by
+the Lossy converter the character is converted from the Unicode to 7-bit code
+letter A. A has a 7-bit code of 0x41. The SMS receiver reads A instead of
+Á. By substituting the character that is similar enough to the original, the
+reader can understand the word. The process of converting Á to A is called
+a lossy conversion. </p> <p> <b>Note</b>: The 7-bit code of A (0x41) can only
+be decoded back to the same Unicode letter A instead of Á. </p> <p>Figure
+2 </p> <fig id="GUID-ACFF9511-D5E0-5558-8008-4CD48EE0B7A1">
+<title>              Lossy conversion            </title>
+<image href="GUID-8862E271-ABA4-5A25-8990-C0B3931E370D_d0e406801_href.png" placement="inline"/>
+</fig> </section>
+<section id="GUID-D2F0E6BE-932E-545D-A0C8-39017E3D67B4"><title>16-bit Unicode
+encoding</title> <p>Unicode is an international standard character set. It
+includes the characters of every language. In Unicode, each character is usually
+encoded in two 8-bit bytes, and takes up more space than 7-bit encoding. </p> </section>
+<section id="GUID-93B3DDF2-8EB1-5853-9DFD-3ABF42ADCB40"><title>National language
+encoding</title> <p>According to 3GPP TS 23.038 V8.1.0, National Language
+Encoding supports additional characters for certain languages which cannot
+be represented in the GSM default 7-bit alphabet. It defines two mechanisms
+for doing this: </p> <ul>
+<li id="GUID-9ECCA8BD-0BA0-5AE2-B2D6-4677D2CD1BD7"><p>Locking shift mechanism–the
+default GSM table is replaced with a table containing the character set needed
+for a language. The table is referred to as locking shift table. </p> </li>
+<li id="GUID-3900D849-350A-5722-9759-D1D768FE6A84"><p>Single shift mechanism–the
+GSM extension table is replaced with a table containing the character set
+needed for a language. The table is referred to as single shift table. </p> </li>
+</ul> <p>When the locking shift mechanism is used, the escape table can be
+the existing GSM extension table or it can be the escape table used by the
+single shift mechanism. This supports three possible mappings as shown in
+Figure 3: </p> <ul>
+<li id="GUID-34ECF450-6265-58E2-9CB6-00E0C5DDA6F8"><p>The GSM 7-bit default
+escapes to language-specific escape table. It is referred to as GSM-single. </p> </li>
+<li id="GUID-6E8A53BF-0572-5DE2-8D41-FB588B6FB812"><p>The Language-specific
+basic table escapes to GSM 7-bit default extension table. It is referred to
+as locking-GSM ext. </p> </li>
+<li id="GUID-830569B1-8ACD-5924-AF7F-15705FEF76B0"><p>The Language-specific
+basic table escapes to language-specific extension table. It is referred to
+as locking-single. </p> </li>
+</ul> <p>Figure 3 </p> <fig id="GUID-541CED9A-2450-5C9D-AADF-93EE59E4D77E">
+<title>              National language encoding            </title>
+<image href="GUID-44347376-702D-5648-8938-EB55AFA329EC_d0e406863_href.png" placement="inline"/>
+</fig><p>The single shift mechanism is useful when a message contains only
+a few characters outside the default GSM table. It is however inefficient
+when a message contains many unsupported characters, because each escaped
+character must occupy 2 bytes. GSM-single supports more characters than locking-GSM
+ext, but these characters are in the single table, which takes 2 bytes. Locking-single
+is used more for the decoding purpose in case the extra characters can come
+from the locking or single table. </p><p>The locking or single table is not
+a complete replacement. For example, the locking table for Turkish redefines
+only 8-character codes from the default GSM table, as shown in table 1. The
+escape table for Turkish adds 7 characters to the GSM extension, as shown
+in table 2. </p><table id="GUID-4AE6F58D-A5DA-4AD9-B39E-A61AA378F3F6"><title>Table 1</title>
+<tgroup cols="3"><colspec colname="col1"/><colspec colname="col2"/><colspec colname="col3"/>
+<thead>
+<row>
+<entry><p>GSM 7-Bit Code</p></entry>
+<entry><p>Turkish Locking Shift Table</p></entry>
+<entry><p>GSM 7-Bit Default Table</p></entry>
+</row>
+</thead>
+<tbody>
+<row>
+<entry><p><codeph>0x40</codeph></p></entry>
+<entry><p>I LATIN CAPITAL LETTER I WITH DOT ABOVE</p></entry>
+<entry><p>¡ INVERTED EXCLAMATION MARK </p></entry>
+</row>
+<row>
+<entry><p><codeph>0x60</codeph></p></entry>
+<entry><p>ç LATIN SMALL LETTER C WITH CEDILLA</p></entry>
+<entry><p>¿ INVERTED QUESTION MARK</p></entry>
+</row>
+<row>
+<entry><p><codeph>0x04</codeph></p></entry>
+<entry><p>€ EURO SIGN</p></entry>
+<entry><p>è LATIN SMALL LETTER E WITH GRAVE</p></entry>
+</row>
+<row>
+<entry><p><codeph>0x07</codeph></p></entry>
+<entry><p>i LATIN SMALL LETTER DOTLESS</p></entry>
+<entry><p>ì LATIN SMALL LETTER I WITH GRAVE</p></entry>
+</row>
+<row>
+<entry><p><codeph>0x0B</codeph></p></entry>
+<entry><p>G LATIN CAPITAL LETTER G WITH BREVE</p></entry>
+<entry><p>Ø LATIN CAPITAL LETTER O WITH STROKE</p></entry>
+</row>
+<row>
+<entry><p><codeph>0x0C</codeph></p></entry>
+<entry><p>g LATIN SMALL LETTER G WITH BREVE</p></entry>
+<entry><p>ø LATIN SMALL LETTER O WITH STROKE</p></entry>
+</row>
+<row>
+<entry><p><codeph>0x1C</codeph></p></entry>
+<entry><p>S LATIN CAPITAL LETTER S WITH CEDILLA *</p></entry>
+<entry><p>Æ LATIN CAPITAL LETTER AE</p></entry>
+</row>
+<row>
+<entry><p><codeph>0x1D</codeph></p></entry>
+<entry><p>s LATIN SMALL LETTER S WITH CEDILLA *</p></entry>
+<entry><p>æ LATIN SMALL LETTER AE</p></entry>
+</row>
+</tbody>
+</tgroup>
+</table> <table id="GUID-EC345039-0CB5-4F51-8CFA-83286790AC75"><title>Table 2</title>
+<tgroup cols="3"><colspec colname="col1"/><colspec colname="col2"/><colspec colname="col3"/>
+<thead>
+<row>
+<entry><p>GSM 7-Bit Code</p></entry>
+<entry><p>Turkish Single Shift Table</p></entry>
+<entry><p>GSM 7-Bit Extension Table</p></entry>
+</row>
+</thead>
+<tbody>
+<row>
+<entry><p><codeph>0x1B49</codeph></p></entry>
+<entry><p>I LATIN CAPITAL LETTER I WITH DOT ABOVE</p></entry>
+<entry><p/></entry>
+</row>
+<row>
+<entry><p><codeph>0x1B63</codeph></p></entry>
+<entry><p>ç LATIN SMALL LETTER C WITH CEDILLA</p></entry>
+<entry><p/></entry>
+</row>
+<row>
+<entry><p><codeph>0x1B69</codeph></p></entry>
+<entry><p>i LATIN SMALL LETTER DOTLESS</p></entry>
+<entry><p/></entry>
+</row>
+<row>
+<entry><p><codeph>0x1B47</codeph></p></entry>
+<entry><p>G LATIN CAPITAL LETTER G WITH BREVE</p></entry>
+<entry><p/></entry>
+</row>
+<row>
+<entry><p><codeph>0x1B67</codeph></p></entry>
+<entry><p>g LATIN SMALL LETTER G WITH BREVE</p></entry>
+<entry><p/></entry>
+</row>
+<row>
+<entry><p><codeph>0x1B53</codeph></p></entry>
+<entry><p>S LATIN CAPITAL LETTER S WITH CEDILLA *</p></entry>
+<entry><p/></entry>
+</row>
+<row>
+<entry><p><codeph>0x1B73</codeph></p></entry>
+<entry><p>s LATIN SMALL LETTER S WITH CEDILLA *</p></entry>
+<entry><p/></entry>
+</row>
+</tbody>
+</tgroup>
+</table><p>For more information about the National Language Identifier, Single
+or Locking mechanism, see 3GPP TS 23.038 V8.1.0: National Language Identifier.</p></section>
+<section><title>See also</title> <p> <xref href="GUID-0BC9A9A1-DB99-5095-8390-E1C1B04D0080.dita">SMS
+Encodings and Converters Overview</xref> </p> </section>
+</conbody></concept>
\ No newline at end of file