The Extended SMS Converter uses the lossy mechanism to extend the alphabet of the standard converter. It maps a wider set of input character codes, including commonly-used Eastern European Unicode characters, to the standard 7-bit alphabet. This section describes the Extended SMS Converter and the alphabet it supports.
Introduction Languages supported by this converter include Croatian, Czech, Estonian, Hungarian, Icelandic, Latvian, Lithuanian, Polish, Romanian, Serbian, Slovak, Slovenian, Turkish, Portuguese and Spanish. This converter is identified by the KCharacterSetIdentifierExtendedSms7Bit UID, which is defined in the charconv.h file.
Any undefined Unicode is converted to a question mark (?)–GSM code 0x37. Any code outside GSM 0x00 ~0x7F is converted to the Unicode replacement character 0xFFFD.
Alphabet The highlighted boxes in Figure 1 illustrate the alphabet of the extended SMS converter:
GSM 7-bit default alphabet
GSM 7-bit default alphabet extension table
Extra lossy conversions–exclude 9 characters listed in Table 2
Extended lossy conversions–shown as Lossy Characters 2 in Figure 1.
Figure 1
Table 1 lists the extra lossy conversions supported by this converter in addition to those supported by the standard converter.
Table 1
Character
Unicode
GSM
Converted Character
Ώ GREEK CAPITAL LETTER OMEGA WITH TONOS
U+038F
0x15
Ω GREEK CAPITAL LETTER OMEGA
(NO-BREAK SPACE)
U+00A0
0x20
(SPACE)
« LEFT-POINTING DOUBLE ANGLE QUOTATION MARK *
U+00AB
0x22
" QUOTATION MARK
» RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK *
U+00BB
0x22
" QUOTATION MARK
` GRAVE ACCENT
U+0060
0x27
' APOSTROPHE
´ ACUTE ACCENT
U+00B4
0x27
' APOSTROPHE
΄ GREEK TONOS
U+0384
0x27
' APOSTROPHE
΅ GREEK DIALYTIKA TONOS
U+0385
0x27
' APOSTROPHE
× MULTIPLICATION SIGN
U+00D7
0x2A
* ASTERISK
¸ CEDILLA
U+00B8
0x2C
, COMMA
SOFT HYPHEN
U+00AD
0x2D
- HYPHEN-MINUS
· MIDDLE DOT
U+00B7
0x2E
. FULL STOP
÷ DIVISION SIGN
U+00F7
0x2F
/ SOLIDUS
¹ SUPERSCRIPT ONE
U+00B9
0x31
1 DIGIT ONE
² SUPERSCRIPT TWO
U+00B2
0x32
2 DIGIT TWO
³ SUPERSCRIPT THREE
U+00B3
0x33
3 DIGIT THREE
; GREEK QUESTION MARK (Erotimatiko)
U+037E
0x3B
; SEMICOLON
Ā LATIN CAPITAL LETTER A WITH MACRON
U+0100
0x41
A LATIN CAPITAL LETTER A
Ă LATIN CAPITAL LETTER A WITH BREVE
U+0102
0x41
A LATIN CAPITAL LETTER A
Ą LATIN CAPITAL LETTER A WITH OGONEK
U+0104
0x41
A LATIN CAPITAL LETTER A
Ć LATIN CAPITAL LETTER C WITH ACUTE
U+0106
0x43
C LATIN CAPITAL LETTER C
Ĉ LATIN CAPITAL LETTER C WITH CIRCUMFLEX
U+0108
0x43
C LATIN CAPITAL LETTER C
Ċ LATIN CAPITAL LETTER C WITH DOT ABOVE
U+010A
0x43
C LATIN CAPITAL LETTER C
Č LATIN CAPITAL LETTER C WITH CARON
U+010C
0x43
C LATIN CAPITAL LETTER C
Ð LATIN CAPITAL LETTER ETH (Icelandic)
U+00D0
0x44
D LATIN CAPITAL LETTER D
Ď LATIN CAPITAL LETTER D WITH CARON
U+010E
0x44
D LATIN CAPITAL LETTER D
Đ LATIN CAPITAL LETTER D WITH STROKE
U+0110
0x44
D LATIN CAPITAL LETTER D
Ē LATIN CAPITAL LETTER E WITH MACRON
U+0112
0x45
E LATIN CAPITAL LETTER E
Ĕ LATIN CAPITAL LETTER E WITH BREVE
U+0114
0x45
E LATIN CAPITAL LETTER E
Ė LATIN CAPITAL LETTER E WITH DOT ABOVE
U+0116
0x45
E LATIN CAPITAL LETTER E
Ę LATIN CAPITAL LETTER E WITH OGONEK
U+0118
0x45
E LATIN CAPITAL LETTER E
Ě LATIN CAPITAL LETTER E WITH CARON
U+011A
0x45
E LATIN CAPITAL LETTER E
Ĝ LATIN CAPITAL LETTER G WITH CIRCUMFLEX
U+011C
0x47
G LATIN CAPITAL LETTER G
Ğ LATIN CAPITAL LETTER G WITH BREVE
U+011E
0x47
G LATIN CAPITAL LETTER G
Ġ LATIN CAPITAL LETTER G WITH DOT ABOVE
U+0120
0x47
G LATIN CAPITAL LETTER G
Ģ LATIN CAPITAL LETTER G WITH CEDILLA
U+0122
0x47
G LATIN CAPITAL LETTER G
Ĥ LATIN CAPITAL LETTER H WITH CIRCUMFLEX
U+0124
0x48
H LATIN CAPITAL LETTER H
Ħ LATIN CAPITAL LETTER H WITH STROKE
U+0126
0x48
H LATIN CAPITAL LETTER H
Ĩ LATIN CAPITAL LETTER I WITH TILDE
U+0128
0x49
I LATIN CAPITAL LETTER I
Ī LATIN CAPITAL LETTER I WITH MACRON
U+012A
0x49
I LATIN CAPITAL LETTER I
Ĭ LATIN CAPITAL LETTER I WITH BREVE
U+012C
0x49
I LATIN CAPITAL LETTER I
Į LATIN CAPITAL LETTER I WITH OGONEK
U+012E
0x49
I LATIN CAPITAL LETTER I
İ LATIN CAPITAL LETTER I WITH DOT ABOVE
U+0130
0x49
I LATIN CAPITAL LETTER I
Ĵ LATIN CAPITAL LETTER J WITH CIRCUMFLEX
U+0134
0x4A
J LATIN CAPITAL LETTER J
Ķ LATIN CAPITAL LETTER K WITH CEDILLA
U+0136
0x4B
K LATIN CAPITAL LETTER K
Ĺ LATIN CAPITAL LETTER L WITH ACUTE
U+0139
0x4C
L LATIN CAPITAL LETTER L
Ļ LATIN CAPITAL LETTER L WITH CEDILLA
U+013B
0x4C
L LATIN CAPITAL LETTER L
Ľ LATIN CAPITAL LETTER L WITH CARON
U+013D
0x4C
L LATIN CAPITAL LETTER L
Ŀ LATIN CAPITAL LETTER L WITH MIDDLE DOT
U+013F
0x4C
L LATIN CAPITAL LETTER L
Ł LATIN CAPITAL LETTER L WITH STROKE
U+0141
0x4C
L LATIN CAPITAL LETTER L
Ń LATIN CAPITAL LETTER N WITH ACUTE
U+0143
0x4E
N LATIN CAPITAL LETTER N
Ņ LATIN CAPITAL LETTER N WITH CEDILLA
U+0145
0x4E
N LATIN CAPITAL LETTER N
Ň LATIN CAPITAL LETTER N WITH CARON
U+0147
0x4E
N LATIN CAPITAL LETTER N
Ŋ LATIN CAPITAL LETTER ENG (Sami)
U+014A
0x4E
N LATIN CAPITAL LETTER N
Ō LATIN CAPITAL LETTER O WITH MACRON
U+014C
0x4F
O LATIN CAPITAL LETTER O
Ŏ LATIN CAPITAL LETTER O WITH BREVE
U+014E
0x4F
O LATIN CAPITAL LETTER O
Œ LATIN CAPITAL LIGATURE OE
U+0152
0x4F
O LATIN CAPITAL LETTER O
Ŕ LATIN CAPITAL LETTER R WITH ACUTE
U+0154
0x52
R LATIN CAPITAL LETTER R
Ŗ LATIN CAPITAL LETTER R WITH CEDILLA
U+0156
0x52
R LATIN CAPITAL LETTER R
Ř LATIN CAPITAL LETTER R WITH CARON
U+0158
0x52
R LATIN CAPITAL LETTER R
Ś LATIN CAPITAL LETTER S WITH ACUTE
U+015A
0x53
S LATIN CAPITAL LETTER S
Ŝ LATIN CAPITAL LETTER S WITH CIRCUMFLEX
U+015C
0x53
S LATIN CAPITAL LETTER S
Ş LATIN CAPITAL LETTER S WITH CEDILLA *
U+015E
0x53
S LATIN CAPITAL LETTER S
Š LATIN CAPITAL LETTER S WITH CARON
U+0160
0x53
S LATIN CAPITAL LETTER S
Þ LATIN CAPITAL LETTER THORN (Icelandic)
U+00DE
0x54
T LATIN CAPITAL LETTER T
Ţ LATIN CAPITAL LETTER T WITH CEDILLA *
U+0162
0x54
T LATIN CAPITAL LETTER T
Ť LATIN CAPITAL LETTER T WITH CARON
U+0164
0x54
T LATIN CAPITAL LETTER T
Ŧ LATIN CAPITAL LETTER T WITH STROKE
U+0166
0x54
T LATIN CAPITAL LETTER T
Ũ LATIN CAPITAL LETTER U WITH TILDE
U+0168
0x55
U LATIN CAPITAL LETTER U
Ū LATIN CAPITAL LETTER U WITH MACRON
U+016A
0x55
U LATIN CAPITAL LETTER U
Ŭ LATIN CAPITAL LETTER U WITH BREVE
U+016C
0x55
U LATIN CAPITAL LETTER U
Ů LATIN CAPITAL LETTER U WITH RING ABOVE
U+016E
0x55
U LATIN CAPITAL LETTER U
Ų LATIN CAPITAL LETTER U WITH OGONEK
U+0172
0x55
U LATIN CAPITAL LETTER U
Ŵ LATIN CAPITAL LETTER W WITH CIRCUMFLEX
U+0174
0x57
W LATIN CAPITAL LETTER W
Ŷ LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
U+0176
0x59
Y LATIN CAPITAL LETTER Y
Ÿ LATIN CAPITAL LETTER Y WITH DIAERESIS
U+0178
0x59
Y LATIN CAPITAL LETTER Y
Ź LATIN CAPITAL LETTER Z WITH ACUTE
U+0179
0x5A
Z LATIN CAPITAL LETTER Z
Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE
U+017B
0x5A
Z LATIN CAPITAL LETTER Z
Ž LATIN CAPITAL LETTER Z WITH CARON
U+017D
0x5A
Z LATIN CAPITAL LETTER Z
Ö LATIN CAPITAL LETTER O WITH DIAERESIS
U+00D6
0x5C
Ö LATIN CAPITAL LETTER O WITH DIAERESIS
Ő LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
U+0150
0x5C
Ö LATIN CAPITAL LETTER O WITH DIAERESIS
Ű LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
U+0170
0x5E
Ü LATIN CAPITAL LETTER U WITH DIAERESIS
ā LATIN SMALL LETTER A WITH MACRON
U+0101
0x61
a LATIN SMALL LETTER A
ă LATIN SMALL LETTER A WITH BREVE
U+0103
0x61
a LATIN SMALL LETTER A
ą LATIN SMALL LETTER A WITH OGONEK
U+0105
0x61
a LATIN SMALL LETTER A
ª FEMININE ORDINAL INDICATOR
U+00AA
0x61
a LATIN SMALL LETTER A
ć LATIN SMALL LETTER C WITH ACUTE
U+0107
0x63
c LATIN SMALL LETTER C
ĉ LATIN SMALL LETTER C WITH CIRCUMFLEX
U+0109
0x63
c LATIN SMALL LETTER C
ċ LATIN SMALL LETTER C WITH DOT ABOVE
U+010B
0x63
c LATIN SMALL LETTER C
č LATIN SMALL LETTER C WITH CARON
U+010D
0x63
c LATIN SMALL LETTER C
¢ CENT SIGN
U+00A2
0x63
c LATIN SMALL LETTER C
© COPYRIGHT SIGN
U+00A9
0x63
c LATIN SMALL LETTER C
ð LATIN SMALL LETTER ETH (Icelandic)
U+00F0
0x64
d LATIN SMALL LETTER D
ď LATIN SMALL LETTER D WITH CARON
U+010F
0x64
d LATIN SMALL LETTER D
đ LATIN SMALL LETTER D WITH STROKE
U+0111
0x64
d LATIN SMALL LETTER D
ē LATIN SMALL LETTER E WITH MACRON
U+0113
0x65
e LATIN SMALL LETTER E
ĕ LATIN SMALL LETTER E WITH BREVE
U+0115
0x65
e LATIN SMALL LETTER E
ė LATIN SMALL LETTER E WITH DOT ABOVE
U+0117
0x65
e LATIN SMALL LETTER E
ę LATIN SMALL LETTER E WITH OGONEK
U+0119
0x65
e LATIN SMALL LETTER E
ě LATIN SMALL LETTER E WITH CARON
U+011B
0x65
e LATIN SMALL LETTER E
ĝ LATIN SMALL LETTER G WITH CIRCUMFLEX
U+011D
0x67
g LATIN SMALL LETTER G
ğ LATIN SMALL LETTER G WITH BREVE
U+011F
0x67
g LATIN SMALL LETTER G
ġ LATIN SMALL LETTER G WITH DOT ABOVE
U+0121
0x67
g LATIN SMALL LETTER G
ģ LATIN SMALL LETTER G WITH CEDILLA
U+0123
0x67
g LATIN SMALL LETTER G
ĥ LATIN SMALL LETTER H WITH CIRCUMFLEX
U+0125
0x68
h LATIN SMALL LETTER H
ħ LATIN SMALL LETTER H WITH STROKE
U+0127
0x68
h LATIN SMALL LETTER H
ĩ LATIN SMALL LETTER I WITH TILDE
U+0129
0x69
i LATIN SMALL LETTER I
ī LATIN SMALL LETTER I WITH MACRON
U+012B
0x69
i LATIN SMALL LETTER I
ĭ LATIN SMALL LETTER I WITH BREVE
U+012D
0x69
i LATIN SMALL LETTER I
į LATIN SMALL LETTER I WITH OGONEK
U+012F
0x69
i LATIN SMALL LETTER I
ı LATIN SMALL LETTER DOTLESS I
U+0131
0x69
i LATIN SMALL LETTER I
ĵ LATIN SMALL LETTER J WITH CIRCUMFLEX
U+0135
0x6A
j LATIN SMALL LETTER J
ķ LATIN SMALL LETTER K WITH CEDILLA
U+0137
0x6B
k LATIN SMALL LETTER K
ĸ LATIN SMALL LETTER KRA (Greenlandic)
U+0138
0x6B
k LATIN SMALL LETTER K
ĺ LATIN SMALL LETTER L WITH ACUTE
U+013A
0x6C
l LATIN SMALL LETTER L
ļ LATIN SMALL LETTER L WITH CEDILLA
U+013C
0x6C
l LATIN SMALL LETTER L
ľ LATIN SMALL LETTER L WITH CARON
U+013E
0x6C
l LATIN SMALL LETTER L
ŀ LATIN SMALL LETTER L WITH MIDDLE DOT
U+0140
0x6C
l LATIN SMALL LETTER L
ł LATIN SMALL LETTER L WITH STROKE
U+0142
0x6C
l LATIN SMALL LETTER L
ń LATIN SMALL LETTER N WITH ACUTE
U+0144
0x6E
n LATIN SMALL LETTER N
ņ LATIN SMALL LETTER N WITH CEDILLA
U+0146
0x6E
n LATIN SMALL LETTER N
ň LATIN SMALL LETTER N WITH CARON
U+0148
0x6E
n LATIN SMALL LETTER N
ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
U+0149
0x6E
n LATIN SMALL LETTER N
ŋ LATIN SMALL LETTER ENG (Sami)
U+014B
0x6E
n LATIN SMALL LETTER N
ō LATIN SMALL LETTER O WITH MACRON
U+014D
0x6F
o LATIN SMALL LETTER O
ŏ LATIN SMALL LETTER O WITH BREVE
U+014F
0x6F
o LATIN SMALL LETTER O
° DEGREE SIGN
U+00B0
0x6F
o LATIN SMALL LETTER O
º MASCULINE ORDINAL INDICATOR
U+00BA
0x6F
o LATIN SMALL LETTER O
œ LATIN SMALL LIGATURE OE
U+0153
0x6F
o LATIN SMALL LETTER O
ŕ LATIN SMALL LETTER R WITH ACUTE
U+0155
0x72
r LATIN SMALL LETTER R
ŗ LATIN SMALL LETTER R WITH CEDILLA
U+0157
0x72
r LATIN SMALL LETTER R
ř LATIN SMALL LETTER R WITH CARON
U+0159
0x72
r LATIN SMALL LETTER R
® REGISTERED SIGN
U+00AE
0x72
r LATIN SMALL LETTER R
ś LATIN SMALL LETTER S WITH ACUTE
U+015B
0x73
s LATIN SMALL LETTER S
ŝ LATIN SMALL LETTER S WITH CIRCUMFLEX
U+015D
0x73
s LATIN SMALL LETTER S
ş LATIN SMALL LETTER S WITH CEDILLA *
U+015F
0x73
s LATIN SMALL LETTER S
š LATIN SMALL LETTER S WITH CARON
U+0161
0x73
s LATIN SMALL LETTER S
þ LATIN SMALL LETTER THORN (Icelandic)
U+00FE
0x74
t LATIN SMALL LETTER T
ţ LATIN SMALL LETTER T WITH CEDILLA *
U+0163
0x74
t LATIN SMALL LETTER T
ť LATIN SMALL LETTER T WITH CARON
U+0165
0x74
t LATIN SMALL LETTER T
ŧ LATIN SMALL LETTER T WITH STROKE
U+0167
0x74
t LATIN SMALL LETTER T
ũ LATIN SMALL LETTER U WITH TILDE
U+0169
0x75
u LATIN SMALL LETTER U
ū LATIN SMALL LETTER U WITH MACRON
U+016B
0x75
u LATIN SMALL LETTER U
ŭ LATIN SMALL LETTER U WITH BREVE
U+016D
0x75
u LATIN SMALL LETTER U
ů LATIN SMALL LETTER U WITH RING ABOVE
U+016F
0x75
u LATIN SMALL LETTER U
ų LATIN SMALL LETTER U WITH OGONEK
U+0173
0x75
u LATIN SMALL LETTER U
µ MICRO SIGN
U+00B5
0x75
u LATIN SMALL LETTER U
ŵ LATIN SMALL LETTER W WITH CIRCUMFLEX
U+0175
0x77
w LATIN SMALL LETTER W
ŷ LATIN SMALL LETTER Y WITH CIRCUMFLEX
U+0177
0x79
y LATIN SMALL LETTER Y
ź LATIN SMALL LETTER Z WITH ACUTE
U+017A
0x7A
z LATIN SMALL LETTER Z
ż LATIN SMALL LETTER Z WITH DOT ABOVE
U+017C
0x7A
z LATIN SMALL LETTER Z
ž LATIN SMALL LETTER Z WITH CARON
U+017E
0x7A
z LATIN SMALL LETTER Z
ő LATIN SMALL LETTER O WITH DOUBLE ACUTE
U+0151
0x7C
ö LATIN SMALL LETTER O WITH DIAERESIS
ű LATIN SMALL LETTER U WITH DOUBLE ACUTE
U+0171
0x7E
ü LATIN SMALL LETTER U WITH DIAERESIS
Table 2 lists the 9 characters in Lossy Character 1 supported by Standard SMS Converter but not by Extended SMS Converter.
Table 2
Character
Unicode
GSM
Converted Character
ϕ GREEK PHI SYMBOL
0x03D5
0x12
Φ GREEK CAPITAL LETTER PHI
Ω OHM SIGN
0x2126
0x15
Ω GREEK CAPITAL LETTER OMEGA
∏ N-ARY PRODUCT
0x220F
0x16
Π GREEK CAPITAL LETTER PI
∑ N-ARY SUMMATION
0x2211
0x18
Σ GREEK CAPITAL LETTER SIGMA
ϑ GREEK THETA SYMBOL
0x03D1
0x19
Θ GREEK CAPITAL LETTER THETA
ϐ GREEK BETA SYMBOL
0x03D0
0x42
B LATIN CAPITAL LETTER B
ϒ GREEK UPSILON WITH HOOK SYMBOL
0x03D2
0x59
Y LATIN CAPITAL LETTER Y
ϓ GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
0x03D3
0x59
Y LATIN CAPITAL LETTER Y
ϔ GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
0x03D4
0x59
Y LATIN CAPITAL LETTER Y
See also SMS Encodings and Converters Overview