Extended SMS Converter

The Extended SMS Converter uses the lossy mechanism to extend the alphabet of the standard converter. It maps a wider set of input character codes, including commonly-used Eastern European Unicode characters, to the standard 7-bit alphabet. This section describes the Extended SMS Converter and the alphabet it supports.

Introduction

Languages supported by this converter include Croatian, Czech, Estonian, Hungarian, Icelandic, Latvian, Lithuanian, Polish, Romanian, Serbian, Slovak, Slovenian, Turkish, Portuguese and Spanish. This converter is identified by the KCharacterSetIdentifierExtendedSms7Bit UID, which is defined in the charconv.h file.

Any undefined Unicode is converted to a question mark (?)–GSM code 0x37 . Any code outside GSM 0x00 ~ 0x7F is converted to the Unicode replacement character 0xFFFD .

Alphabet

The highlighted boxes in Figure 1 illustrate the alphabet of the extended SMS converter:

  • GSM 7-bit default alphabet

  • GSM 7-bit default alphabet extension table

  • Extra lossy conversions–exclude 9 characters listed in Table 2

  • Extended lossy conversions–shown as Lossy Characters 2 in Figure 1.

Figure 1

Table 1 lists the extra lossy conversions supported by this converter in addition to those supported by the standard converter.

Table 1

Character

Unicode

GSM

Converted Character

Ώ GREEK CAPITAL LETTER OMEGA WITH TONOS

U+038F

0x15

Ω GREEK CAPITAL LETTER OMEGA

(NO-BREAK SPACE)

U+00A0

0x20

(SPACE)

« LEFT-POINTING DOUBLE ANGLE QUOTATION MARK *

U+00AB

0x22

" QUOTATION MARK

» RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK *

U+00BB

0x22

" QUOTATION MARK

` GRAVE ACCENT

U+0060

0x27

' APOSTROPHE

´ ACUTE ACCENT

U+00B4

0x27

' APOSTROPHE

΄ GREEK TONOS

U+0384

0x27

' APOSTROPHE

΅ GREEK DIALYTIKA TONOS

U+0385

0x27

' APOSTROPHE

× MULTIPLICATION SIGN

U+00D7

0x2A

* ASTERISK

¸ CEDILLA

U+00B8

0x2C

, COMMA

SOFT HYPHEN

U+00AD

0x2D

- HYPHEN-MINUS

· MIDDLE DOT

U+00B7

0x2E

. FULL STOP

÷ DIVISION SIGN

U+00F7

0x2F

/ SOLIDUS

¹ SUPERSCRIPT ONE

U+00B9

0x31

1 DIGIT ONE

² SUPERSCRIPT TWO

U+00B2

0x32

2 DIGIT TWO

³ SUPERSCRIPT THREE

U+00B3

0x33

3 DIGIT THREE

; GREEK QUESTION MARK (Erotimatiko)

U+037E

0x3B

; SEMICOLON

Ā LATIN CAPITAL LETTER A WITH MACRON

U+0100

0x41

A LATIN CAPITAL LETTER A

Ă LATIN CAPITAL LETTER A WITH BREVE

U+0102

0x41

A LATIN CAPITAL LETTER A

Ą LATIN CAPITAL LETTER A WITH OGONEK

U+0104

0x41

A LATIN CAPITAL LETTER A

Ć LATIN CAPITAL LETTER C WITH ACUTE

U+0106

0x43

C LATIN CAPITAL LETTER C

Ĉ LATIN CAPITAL LETTER C WITH CIRCUMFLEX

U+0108

0x43

C LATIN CAPITAL LETTER C

Ċ LATIN CAPITAL LETTER C WITH DOT ABOVE

U+010A

0x43

C LATIN CAPITAL LETTER C

Č LATIN CAPITAL LETTER C WITH CARON

U+010C

0x43

C LATIN CAPITAL LETTER C

Ð LATIN CAPITAL LETTER ETH (Icelandic)

U+00D0

0x44

D LATIN CAPITAL LETTER D

Ď LATIN CAPITAL LETTER D WITH CARON

U+010E

0x44

D LATIN CAPITAL LETTER D

Đ LATIN CAPITAL LETTER D WITH STROKE

U+0110

0x44

D LATIN CAPITAL LETTER D

Ē LATIN CAPITAL LETTER E WITH MACRON

U+0112

0x45

E LATIN CAPITAL LETTER E

Ĕ LATIN CAPITAL LETTER E WITH BREVE

U+0114

0x45

E LATIN CAPITAL LETTER E

Ė LATIN CAPITAL LETTER E WITH DOT ABOVE

U+0116

0x45

E LATIN CAPITAL LETTER E

Ę LATIN CAPITAL LETTER E WITH OGONEK

U+0118

0x45

E LATIN CAPITAL LETTER E

Ě LATIN CAPITAL LETTER E WITH CARON

U+011A

0x45

E LATIN CAPITAL LETTER E

Ĝ LATIN CAPITAL LETTER G WITH CIRCUMFLEX

U+011C

0x47

G LATIN CAPITAL LETTER G

Ğ LATIN CAPITAL LETTER G WITH BREVE

U+011E

0x47

G LATIN CAPITAL LETTER G

Ġ LATIN CAPITAL LETTER G WITH DOT ABOVE

U+0120

0x47

G LATIN CAPITAL LETTER G

Ģ LATIN CAPITAL LETTER G WITH CEDILLA

U+0122

0x47

G LATIN CAPITAL LETTER G

Ĥ LATIN CAPITAL LETTER H WITH CIRCUMFLEX

U+0124

0x48

H LATIN CAPITAL LETTER H

Ħ LATIN CAPITAL LETTER H WITH STROKE

U+0126

0x48

H LATIN CAPITAL LETTER H

Ĩ LATIN CAPITAL LETTER I WITH TILDE

U+0128

0x49

I LATIN CAPITAL LETTER I

Ī LATIN CAPITAL LETTER I WITH MACRON

U+012A

0x49

I LATIN CAPITAL LETTER I

Ĭ LATIN CAPITAL LETTER I WITH BREVE

U+012C

0x49

I LATIN CAPITAL LETTER I

Į LATIN CAPITAL LETTER I WITH OGONEK

U+012E

0x49

I LATIN CAPITAL LETTER I

İ LATIN CAPITAL LETTER I WITH DOT ABOVE

U+0130

0x49

I LATIN CAPITAL LETTER I

Ĵ LATIN CAPITAL LETTER J WITH CIRCUMFLEX

U+0134

0x4A

J LATIN CAPITAL LETTER J

Ķ LATIN CAPITAL LETTER K WITH CEDILLA

U+0136

0x4B

K LATIN CAPITAL LETTER K

Ĺ LATIN CAPITAL LETTER L WITH ACUTE

U+0139

0x4C

L LATIN CAPITAL LETTER L

Ļ LATIN CAPITAL LETTER L WITH CEDILLA

U+013B

0x4C

L LATIN CAPITAL LETTER L

Ľ LATIN CAPITAL LETTER L WITH CARON

U+013D

0x4C

L LATIN CAPITAL LETTER L

Ŀ LATIN CAPITAL LETTER L WITH MIDDLE DOT

U+013F

0x4C

L LATIN CAPITAL LETTER L

Ł LATIN CAPITAL LETTER L WITH STROKE

U+0141

0x4C

L LATIN CAPITAL LETTER L

Ń LATIN CAPITAL LETTER N WITH ACUTE

U+0143

0x4E

N LATIN CAPITAL LETTER N

Ņ LATIN CAPITAL LETTER N WITH CEDILLA

U+0145

0x4E

N LATIN CAPITAL LETTER N

Ň LATIN CAPITAL LETTER N WITH CARON

U+0147

0x4E

N LATIN CAPITAL LETTER N

Ŋ LATIN CAPITAL LETTER ENG (Sami)

U+014A

0x4E

N LATIN CAPITAL LETTER N

Ō LATIN CAPITAL LETTER O WITH MACRON

U+014C

0x4F

O LATIN CAPITAL LETTER O

Ŏ LATIN CAPITAL LETTER O WITH BREVE

U+014E

0x4F

O LATIN CAPITAL LETTER O

Œ LATIN CAPITAL LIGATURE OE

U+0152

0x4F

O LATIN CAPITAL LETTER O

Ŕ LATIN CAPITAL LETTER R WITH ACUTE

U+0154

0x52

R LATIN CAPITAL LETTER R

Ŗ LATIN CAPITAL LETTER R WITH CEDILLA

U+0156

0x52

R LATIN CAPITAL LETTER R

Ř LATIN CAPITAL LETTER R WITH CARON

U+0158

0x52

R LATIN CAPITAL LETTER R

Ś LATIN CAPITAL LETTER S WITH ACUTE

U+015A

0x53

S LATIN CAPITAL LETTER S

Ŝ LATIN CAPITAL LETTER S WITH CIRCUMFLEX

U+015C

0x53

S LATIN CAPITAL LETTER S

Ş LATIN CAPITAL LETTER S WITH CEDILLA *

U+015E

0x53

S LATIN CAPITAL LETTER S

Š LATIN CAPITAL LETTER S WITH CARON

U+0160

0x53

S LATIN CAPITAL LETTER S

Þ LATIN CAPITAL LETTER THORN (Icelandic)

U+00DE

0x54

T LATIN CAPITAL LETTER T

Ţ LATIN CAPITAL LETTER T WITH CEDILLA *

U+0162

0x54

T LATIN CAPITAL LETTER T

Ť LATIN CAPITAL LETTER T WITH CARON

U+0164

0x54

T LATIN CAPITAL LETTER T

Ŧ LATIN CAPITAL LETTER T WITH STROKE

U+0166

0x54

T LATIN CAPITAL LETTER T

Ũ LATIN CAPITAL LETTER U WITH TILDE

U+0168

0x55

U LATIN CAPITAL LETTER U

Ū LATIN CAPITAL LETTER U WITH MACRON

U+016A

0x55

U LATIN CAPITAL LETTER U

Ŭ LATIN CAPITAL LETTER U WITH BREVE

U+016C

0x55

U LATIN CAPITAL LETTER U

Ů LATIN CAPITAL LETTER U WITH RING ABOVE

U+016E

0x55

U LATIN CAPITAL LETTER U

Ų LATIN CAPITAL LETTER U WITH OGONEK

U+0172

0x55

U LATIN CAPITAL LETTER U

Ŵ LATIN CAPITAL LETTER W WITH CIRCUMFLEX

U+0174

0x57

W LATIN CAPITAL LETTER W

Ŷ LATIN CAPITAL LETTER Y WITH CIRCUMFLEX

U+0176

0x59

Y LATIN CAPITAL LETTER Y

Ÿ LATIN CAPITAL LETTER Y WITH DIAERESIS

U+0178

0x59

Y LATIN CAPITAL LETTER Y

Ź LATIN CAPITAL LETTER Z WITH ACUTE

U+0179

0x5A

Z LATIN CAPITAL LETTER Z

Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE

U+017B

0x5A

Z LATIN CAPITAL LETTER Z

Ž LATIN CAPITAL LETTER Z WITH CARON

U+017D

0x5A

Z LATIN CAPITAL LETTER Z

Ö LATIN CAPITAL LETTER O WITH DIAERESIS

U+00D6

0x5C

Ö LATIN CAPITAL LETTER O WITH DIAERESIS

Ő LATIN CAPITAL LETTER O WITH DOUBLE ACUTE

U+0150

0x5C

Ö LATIN CAPITAL LETTER O WITH DIAERESIS

Ű LATIN CAPITAL LETTER U WITH DOUBLE ACUTE

U+0170

0x5E

Ü LATIN CAPITAL LETTER U WITH DIAERESIS

ā LATIN SMALL LETTER A WITH MACRON

U+0101

0x61

a LATIN SMALL LETTER A

ă LATIN SMALL LETTER A WITH BREVE

U+0103

0x61

a LATIN SMALL LETTER A

ą LATIN SMALL LETTER A WITH OGONEK

U+0105

0x61

a LATIN SMALL LETTER A

ª FEMININE ORDINAL INDICATOR

U+00AA

0x61

a LATIN SMALL LETTER A

ć LATIN SMALL LETTER C WITH ACUTE

U+0107

0x63

c LATIN SMALL LETTER C

ĉ LATIN SMALL LETTER C WITH CIRCUMFLEX

U+0109

0x63

c LATIN SMALL LETTER C

ċ LATIN SMALL LETTER C WITH DOT ABOVE

U+010B

0x63

c LATIN SMALL LETTER C

č LATIN SMALL LETTER C WITH CARON

U+010D

0x63

c LATIN SMALL LETTER C

¢ CENT SIGN

U+00A2

0x63

c LATIN SMALL LETTER C

© COPYRIGHT SIGN

U+00A9

0x63

c LATIN SMALL LETTER C

ð LATIN SMALL LETTER ETH (Icelandic)

U+00F0

0x64

d LATIN SMALL LETTER D

ď LATIN SMALL LETTER D WITH CARON

U+010F

0x64

d LATIN SMALL LETTER D

đ LATIN SMALL LETTER D WITH STROKE

U+0111

0x64

d LATIN SMALL LETTER D

ē LATIN SMALL LETTER E WITH MACRON

U+0113

0x65

e LATIN SMALL LETTER E

ĕ LATIN SMALL LETTER E WITH BREVE

U+0115

0x65

e LATIN SMALL LETTER E

ė LATIN SMALL LETTER E WITH DOT ABOVE

U+0117

0x65

e LATIN SMALL LETTER E

ę LATIN SMALL LETTER E WITH OGONEK

U+0119

0x65

e LATIN SMALL LETTER E

ě LATIN SMALL LETTER E WITH CARON

U+011B

0x65

e LATIN SMALL LETTER E

ĝ LATIN SMALL LETTER G WITH CIRCUMFLEX

U+011D

0x67

g LATIN SMALL LETTER G

ğ LATIN SMALL LETTER G WITH BREVE

U+011F

0x67

g LATIN SMALL LETTER G

ġ LATIN SMALL LETTER G WITH DOT ABOVE

U+0121

0x67

g LATIN SMALL LETTER G

ģ LATIN SMALL LETTER G WITH CEDILLA

U+0123

0x67

g LATIN SMALL LETTER G

ĥ LATIN SMALL LETTER H WITH CIRCUMFLEX

U+0125

0x68

h LATIN SMALL LETTER H

ħ LATIN SMALL LETTER H WITH STROKE

U+0127

0x68

h LATIN SMALL LETTER H

ĩ LATIN SMALL LETTER I WITH TILDE

U+0129

0x69

i LATIN SMALL LETTER I

ī LATIN SMALL LETTER I WITH MACRON

U+012B

0x69

i LATIN SMALL LETTER I

ĭ LATIN SMALL LETTER I WITH BREVE

U+012D

0x69

i LATIN SMALL LETTER I

į LATIN SMALL LETTER I WITH OGONEK

U+012F

0x69

i LATIN SMALL LETTER I

ı LATIN SMALL LETTER DOTLESS I

U+0131

0x69

i LATIN SMALL LETTER I

ĵ LATIN SMALL LETTER J WITH CIRCUMFLEX

U+0135

0x6A

j LATIN SMALL LETTER J

ķ LATIN SMALL LETTER K WITH CEDILLA

U+0137

0x6B

k LATIN SMALL LETTER K

ĸ LATIN SMALL LETTER KRA (Greenlandic)

U+0138

0x6B

k LATIN SMALL LETTER K

ĺ LATIN SMALL LETTER L WITH ACUTE

U+013A

0x6C

l LATIN SMALL LETTER L

ļ LATIN SMALL LETTER L WITH CEDILLA

U+013C

0x6C

l LATIN SMALL LETTER L

ľ LATIN SMALL LETTER L WITH CARON

U+013E

0x6C

l LATIN SMALL LETTER L

ŀ LATIN SMALL LETTER L WITH MIDDLE DOT

U+0140

0x6C

l LATIN SMALL LETTER L

ł LATIN SMALL LETTER L WITH STROKE

U+0142

0x6C

l LATIN SMALL LETTER L

ń LATIN SMALL LETTER N WITH ACUTE

U+0144

0x6E

n LATIN SMALL LETTER N

ņ LATIN SMALL LETTER N WITH CEDILLA

U+0146

0x6E

n LATIN SMALL LETTER N

ň LATIN SMALL LETTER N WITH CARON

U+0148

0x6E

n LATIN SMALL LETTER N

ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

U+0149

0x6E

n LATIN SMALL LETTER N

 ŋ LATIN SMALL LETTER ENG (Sami)

U+014B

0x6E

n LATIN SMALL LETTER N

ō LATIN SMALL LETTER O WITH MACRON

U+014D

0x6F

o LATIN SMALL LETTER O

ŏ LATIN SMALL LETTER O WITH BREVE

U+014F

0x6F

o LATIN SMALL LETTER O

° DEGREE SIGN

U+00B0

0x6F

o LATIN SMALL LETTER O

º MASCULINE ORDINAL INDICATOR

U+00BA

0x6F

o LATIN SMALL LETTER O

œ LATIN SMALL LIGATURE OE

U+0153

0x6F

o LATIN SMALL LETTER O

ŕ LATIN SMALL LETTER R WITH ACUTE

U+0155

0x72

r LATIN SMALL LETTER R

ŗ LATIN SMALL LETTER R WITH CEDILLA

U+0157

0x72

r LATIN SMALL LETTER R

ř LATIN SMALL LETTER R WITH CARON

U+0159

0x72

r LATIN SMALL LETTER R

® REGISTERED SIGN

U+00AE

0x72

r LATIN SMALL LETTER R

ś LATIN SMALL LETTER S WITH ACUTE

U+015B

0x73

s LATIN SMALL LETTER S

ŝ LATIN SMALL LETTER S WITH CIRCUMFLEX

U+015D

0x73

s LATIN SMALL LETTER S

ş LATIN SMALL LETTER S WITH CEDILLA *

U+015F

0x73

s LATIN SMALL LETTER S

š LATIN SMALL LETTER S WITH CARON

U+0161

0x73

s LATIN SMALL LETTER S

þ LATIN SMALL LETTER THORN (Icelandic)

U+00FE

0x74

t LATIN SMALL LETTER T

ţ LATIN SMALL LETTER T WITH CEDILLA *

U+0163

0x74

t LATIN SMALL LETTER T

ť LATIN SMALL LETTER T WITH CARON

U+0165

0x74

t LATIN SMALL LETTER T

ŧ LATIN SMALL LETTER T WITH STROKE

U+0167

0x74

t LATIN SMALL LETTER T

ũ LATIN SMALL LETTER U WITH TILDE

U+0169

0x75

u LATIN SMALL LETTER U

ū LATIN SMALL LETTER U WITH MACRON

U+016B

0x75

u LATIN SMALL LETTER U

ŭ LATIN SMALL LETTER U WITH BREVE

U+016D

0x75

u LATIN SMALL LETTER U

ů LATIN SMALL LETTER U WITH RING ABOVE

U+016F

0x75

u LATIN SMALL LETTER U

ų LATIN SMALL LETTER U WITH OGONEK

U+0173

0x75

u LATIN SMALL LETTER U

µ MICRO SIGN

U+00B5

0x75

u LATIN SMALL LETTER U

ŵ LATIN SMALL LETTER W WITH CIRCUMFLEX

U+0175

0x77

w LATIN SMALL LETTER W

ŷ LATIN SMALL LETTER Y WITH CIRCUMFLEX

U+0177

0x79

y LATIN SMALL LETTER Y

ź LATIN SMALL LETTER Z WITH ACUTE

U+017A

0x7A

z LATIN SMALL LETTER Z

 ż LATIN SMALL LETTER Z WITH DOT ABOVE

U+017C

0x7A

z LATIN SMALL LETTER Z

ž LATIN SMALL LETTER Z WITH CARON

U+017E

0x7A

z LATIN SMALL LETTER Z

ő LATIN SMALL LETTER O WITH DOUBLE ACUTE

U+0151

0x7C

ö LATIN SMALL LETTER O WITH DIAERESIS

ű LATIN SMALL LETTER U WITH DOUBLE ACUTE

U+0171

0x7E

ü LATIN SMALL LETTER U WITH DIAERESIS

Table 2 lists the 9 characters in Lossy Character 1 supported by Standard SMS Converter but not by Extended SMS Converter.

Table 2

Character

Unicode

GSM

Converted Character

ϕ GREEK PHI SYMBOL

0x03D5

0x12

Φ GREEK CAPITAL LETTER PHI

Ω OHM SIGN

0x2126

0x15

Ω GREEK CAPITAL LETTER OMEGA

∏ N-ARY PRODUCT

0x220F

0x16

Π GREEK CAPITAL LETTER PI

∑ N-ARY SUMMATION

0x2211

0x18

Σ GREEK CAPITAL LETTER SIGMA

ϑ GREEK THETA SYMBOL

0x03D1

0x19

Θ GREEK CAPITAL LETTER THETA

ϐ GREEK BETA SYMBOL

0x03D0

0x42

B LATIN CAPITAL LETTER B

ϒ GREEK UPSILON WITH HOOK SYMBOL

0x03D2

0x59

Y LATIN CAPITAL LETTER Y

ϓ GREEK UPSILON WITH ACUTE AND HOOK SYMBOL

0x03D3

0x59

Y LATIN CAPITAL LETTER Y

ϔ GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL

0x03D4

0x59

Y LATIN CAPITAL LETTER Y