Symbian3/SDK/Source/GUID-FE94596E-B5BB-51FE-BE38-069840323915.dita
changeset 7 51a74ef9ed63
child 8 ae94777fff8f
equal deleted inserted replaced
6:43e37759235e 7:51a74ef9ed63
       
     1 <?xml version="1.0" encoding="utf-8"?>
       
     2 <!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
       
     3 <!-- This component and the accompanying materials are made available under the terms of the License 
       
     4 "Eclipse Public License v1.0" which accompanies this distribution, 
       
     5 and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
       
     6 <!-- Initial Contributors:
       
     7     Nokia Corporation - initial contribution.
       
     8 Contributors: 
       
     9 -->
       
    10 <!DOCTYPE concept
       
    11   PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
       
    12 <concept id="GUID-FE94596E-B5BB-51FE-BE38-069840323915" xml:lang="en"><title>Encoding
       
    13 Types</title><prolog><metadata><keywords/></metadata></prolog><conbody>
       
    14 <p>This topic describes the types of SMS encoding. </p>
       
    15 <section id="GUID-F7D1E6C8-9605-57FA-9788-AF7FC72BD94C"><title>7-bit GSM encoding</title> <p>7-bit
       
    16 GSM encoding supports the GSM 7-bit default alphabet and GSM 7-bit default
       
    17 alphabet extension table through an escape mechanism. </p> <p>Figure 1 </p> <fig id="GUID-CDEE59FC-F035-5B75-8838-96E94A6714E8">
       
    18 <title>              Escape mechanism            </title>
       
    19 <image href="GUID-08A6B93F-92CD-5182-B142-D353E78016F3_d0e406761_href.png" placement="inline"/>
       
    20 </fig> <p>The GSM 7-bit default alphabet consists of 128 characters. Each
       
    21 character is represented by 7 bits. 10 extra characters are defined in the
       
    22 GSM 7-bit default extension table. These characters are represented by an
       
    23 escape mechanism using the escape character (0x1B). For example, 0x1B65 maps
       
    24 to the Euro sign € (U+20AC). If an escape character byte is followed by a
       
    25 character that is not included in the 10 characters, the escape character
       
    26 is just ignored. This means 0x1B41 maps to Latin capital letter A (U+0041). </p> <p>For
       
    27 more information about the GSM 7-bit default table, extension table and escape
       
    28 mechanism, see 3GPP TS 23.038 V8.1.0. </p> </section>
       
    29 <section id="GUID-918FF2E3-B9F4-5C61-8DBA-F9143DB16460"><title>Lossy 7-bit
       
    30 encoding</title> <p>Lossy 7-bit encoding enlarges the character set supported
       
    31 by 7-bit GSM encoding. Some Unicode Characters do not exist in the target
       
    32 7-bit set. These characters are converted to ones that do exist in the target
       
    33 7-bit set and closely resemble the original, intended character. A lossy encoding
       
    34 using a 7-bit encoding is more cost effective than a UCS-2 encoding. </p> <p> <b>Example
       
    35 of 7-bit encoding</b>  </p> <p>Accented Latin characters are not supported
       
    36 by 7-bit GSM encoding. Figure 2 describes how an accented Latin characters
       
    37 Á, is sent by SMS. Á has a Unicode value of 0x00C1. When it is processed by
       
    38 the Lossy converter the character is converted from the Unicode to 7-bit code
       
    39 letter A. A has a 7-bit code of 0x41. The SMS receiver reads A instead of
       
    40 Á. By substituting the character that is similar enough to the original, the
       
    41 reader can understand the word. The process of converting Á to A is called
       
    42 a lossy conversion. </p> <p> <b>Note</b>: The 7-bit code of A (0x41) can only
       
    43 be decoded back to the same Unicode letter A instead of Á. </p> <p>Figure
       
    44 2 </p> <fig id="GUID-ACFF9511-D5E0-5558-8008-4CD48EE0B7A1">
       
    45 <title>              Lossy conversion            </title>
       
    46 <image href="GUID-8862E271-ABA4-5A25-8990-C0B3931E370D_d0e406801_href.png" placement="inline"/>
       
    47 </fig> </section>
       
    48 <section id="GUID-D2F0E6BE-932E-545D-A0C8-39017E3D67B4"><title>16-bit Unicode
       
    49 encoding</title> <p>Unicode is an international standard character set. It
       
    50 includes the characters of every language. In Unicode, each character is usually
       
    51 encoded in two 8-bit bytes, and takes up more space than 7-bit encoding. </p> </section>
       
    52 <section id="GUID-93B3DDF2-8EB1-5853-9DFD-3ABF42ADCB40"><title>National language
       
    53 encoding</title> <p>According to 3GPP TS 23.038 V8.1.0, National Language
       
    54 Encoding supports additional characters for certain languages which cannot
       
    55 be represented in the GSM default 7-bit alphabet. It defines two mechanisms
       
    56 for doing this: </p> <ul>
       
    57 <li id="GUID-9ECCA8BD-0BA0-5AE2-B2D6-4677D2CD1BD7"><p>Locking shift mechanism–the
       
    58 default GSM table is replaced with a table containing the character set needed
       
    59 for a language. The table is referred to as locking shift table. </p> </li>
       
    60 <li id="GUID-3900D849-350A-5722-9759-D1D768FE6A84"><p>Single shift mechanism–the
       
    61 GSM extension table is replaced with a table containing the character set
       
    62 needed for a language. The table is referred to as single shift table. </p> </li>
       
    63 </ul> <p>When the locking shift mechanism is used, the escape table can be
       
    64 the existing GSM extension table or it can be the escape table used by the
       
    65 single shift mechanism. This supports three possible mappings as shown in
       
    66 Figure 3: </p> <ul>
       
    67 <li id="GUID-34ECF450-6265-58E2-9CB6-00E0C5DDA6F8"><p>The GSM 7-bit default
       
    68 escapes to language-specific escape table. It is referred to as GSM-single. </p> </li>
       
    69 <li id="GUID-6E8A53BF-0572-5DE2-8D41-FB588B6FB812"><p>The Language-specific
       
    70 basic table escapes to GSM 7-bit default extension table. It is referred to
       
    71 as locking-GSM ext. </p> </li>
       
    72 <li id="GUID-830569B1-8ACD-5924-AF7F-15705FEF76B0"><p>The Language-specific
       
    73 basic table escapes to language-specific extension table. It is referred to
       
    74 as locking-single. </p> </li>
       
    75 </ul> <p>Figure 3 </p> <fig id="GUID-541CED9A-2450-5C9D-AADF-93EE59E4D77E">
       
    76 <title>              National language encoding            </title>
       
    77 <image href="GUID-44347376-702D-5648-8938-EB55AFA329EC_d0e406863_href.png" placement="inline"/>
       
    78 </fig><p>The single shift mechanism is useful when a message contains only
       
    79 a few characters outside the default GSM table. It is however inefficient
       
    80 when a message contains many unsupported characters, because each escaped
       
    81 character must occupy 2 bytes. GSM-single supports more characters than locking-GSM
       
    82 ext, but these characters are in the single table, which takes 2 bytes. Locking-single
       
    83 is used more for the decoding purpose in case the extra characters can come
       
    84 from the locking or single table. </p><p>The locking or single table is not
       
    85 a complete replacement. For example, the locking table for Turkish redefines
       
    86 only 8-character codes from the default GSM table, as shown in table 1. The
       
    87 escape table for Turkish adds 7 characters to the GSM extension, as shown
       
    88 in table 2. </p><table id="GUID-4AE6F58D-A5DA-4AD9-B39E-A61AA378F3F6"><title>Table 1</title>
       
    89 <tgroup cols="3"><colspec colname="col1"/><colspec colname="col2"/><colspec colname="col3"/>
       
    90 <thead>
       
    91 <row>
       
    92 <entry><p>GSM 7-Bit Code</p></entry>
       
    93 <entry><p>Turkish Locking Shift Table</p></entry>
       
    94 <entry><p>GSM 7-Bit Default Table</p></entry>
       
    95 </row>
       
    96 </thead>
       
    97 <tbody>
       
    98 <row>
       
    99 <entry><p><codeph>0x40</codeph></p></entry>
       
   100 <entry><p>I LATIN CAPITAL LETTER I WITH DOT ABOVE</p></entry>
       
   101 <entry><p>¡ INVERTED EXCLAMATION MARK </p></entry>
       
   102 </row>
       
   103 <row>
       
   104 <entry><p><codeph>0x60</codeph></p></entry>
       
   105 <entry><p>ç LATIN SMALL LETTER C WITH CEDILLA</p></entry>
       
   106 <entry><p>¿ INVERTED QUESTION MARK</p></entry>
       
   107 </row>
       
   108 <row>
       
   109 <entry><p><codeph>0x04</codeph></p></entry>
       
   110 <entry><p>€ EURO SIGN</p></entry>
       
   111 <entry><p>è LATIN SMALL LETTER E WITH GRAVE</p></entry>
       
   112 </row>
       
   113 <row>
       
   114 <entry><p><codeph>0x07</codeph></p></entry>
       
   115 <entry><p>i LATIN SMALL LETTER DOTLESS</p></entry>
       
   116 <entry><p>ì LATIN SMALL LETTER I WITH GRAVE</p></entry>
       
   117 </row>
       
   118 <row>
       
   119 <entry><p><codeph>0x0B</codeph></p></entry>
       
   120 <entry><p>G LATIN CAPITAL LETTER G WITH BREVE</p></entry>
       
   121 <entry><p>Ø LATIN CAPITAL LETTER O WITH STROKE</p></entry>
       
   122 </row>
       
   123 <row>
       
   124 <entry><p><codeph>0x0C</codeph></p></entry>
       
   125 <entry><p>g LATIN SMALL LETTER G WITH BREVE</p></entry>
       
   126 <entry><p>ø LATIN SMALL LETTER O WITH STROKE</p></entry>
       
   127 </row>
       
   128 <row>
       
   129 <entry><p><codeph>0x1C</codeph></p></entry>
       
   130 <entry><p>S LATIN CAPITAL LETTER S WITH CEDILLA *</p></entry>
       
   131 <entry><p>Æ LATIN CAPITAL LETTER AE</p></entry>
       
   132 </row>
       
   133 <row>
       
   134 <entry><p><codeph>0x1D</codeph></p></entry>
       
   135 <entry><p>s LATIN SMALL LETTER S WITH CEDILLA *</p></entry>
       
   136 <entry><p>æ LATIN SMALL LETTER AE</p></entry>
       
   137 </row>
       
   138 </tbody>
       
   139 </tgroup>
       
   140 </table> <table id="GUID-EC345039-0CB5-4F51-8CFA-83286790AC75"><title>Table 2</title>
       
   141 <tgroup cols="3"><colspec colname="col1"/><colspec colname="col2"/><colspec colname="col3"/>
       
   142 <thead>
       
   143 <row>
       
   144 <entry><p>GSM 7-Bit Code</p></entry>
       
   145 <entry><p>Turkish Single Shift Table</p></entry>
       
   146 <entry><p>GSM 7-Bit Extension Table</p></entry>
       
   147 </row>
       
   148 </thead>
       
   149 <tbody>
       
   150 <row>
       
   151 <entry><p><codeph>0x1B49</codeph></p></entry>
       
   152 <entry><p>I LATIN CAPITAL LETTER I WITH DOT ABOVE</p></entry>
       
   153 <entry><p/></entry>
       
   154 </row>
       
   155 <row>
       
   156 <entry><p><codeph>0x1B63</codeph></p></entry>
       
   157 <entry><p>ç LATIN SMALL LETTER C WITH CEDILLA</p></entry>
       
   158 <entry><p/></entry>
       
   159 </row>
       
   160 <row>
       
   161 <entry><p><codeph>0x1B69</codeph></p></entry>
       
   162 <entry><p>i LATIN SMALL LETTER DOTLESS</p></entry>
       
   163 <entry><p/></entry>
       
   164 </row>
       
   165 <row>
       
   166 <entry><p><codeph>0x1B47</codeph></p></entry>
       
   167 <entry><p>G LATIN CAPITAL LETTER G WITH BREVE</p></entry>
       
   168 <entry><p/></entry>
       
   169 </row>
       
   170 <row>
       
   171 <entry><p><codeph>0x1B67</codeph></p></entry>
       
   172 <entry><p>g LATIN SMALL LETTER G WITH BREVE</p></entry>
       
   173 <entry><p/></entry>
       
   174 </row>
       
   175 <row>
       
   176 <entry><p><codeph>0x1B53</codeph></p></entry>
       
   177 <entry><p>S LATIN CAPITAL LETTER S WITH CEDILLA *</p></entry>
       
   178 <entry><p/></entry>
       
   179 </row>
       
   180 <row>
       
   181 <entry><p><codeph>0x1B73</codeph></p></entry>
       
   182 <entry><p>s LATIN SMALL LETTER S WITH CEDILLA *</p></entry>
       
   183 <entry><p/></entry>
       
   184 </row>
       
   185 </tbody>
       
   186 </tgroup>
       
   187 </table><p>For more information about the National Language Identifier, Single
       
   188 or Locking mechanism, see 3GPP TS 23.038 V8.1.0: National Language Identifier.</p></section>
       
   189 <section><title>See also</title> <p> <xref href="GUID-0BC9A9A1-DB99-5095-8390-E1C1B04D0080.dita">SMS
       
   190 Encodings and Converters Overview</xref> </p> </section>
       
   191 </conbody></concept>