SMS Encoding and Decoding

This section describes the SMS encoding and decoding concepts. The encoding and decoding of the SMS messages is done in the SMS stack.

The GSMU of the SMS stack provides the relevant methods for encoding and decoding the messages. The character converter (CharConv) components provide the plug-ins required and the mapping table for encoding and decoding.

Encoding

The default method for encoding is 7 bit method. Some languages contain the characters based on Latin but with additional accented characters, these accented characters cannot be encoded using the 7 bit converters. The default 7 bit encoding method allows 160 characters per SMS message. 8 bit encoding is also possible. The 8 bit encoding supports 140 characters in a message. The additional characters can be supported in two forms:

Unicode

The accented languages can be encoded using 16 bit Unicode. The disadvantage of using Unicode is that it limits the message data to 70 characters. Unicode encoding leads to increase in cost for the end user. For example if a user sends a message with 100 characters using Unicode encoding will require two separate SMS messages instead of 1 message using 7 bit encoding.

Alternative encoder

The second method available to support the accented Latin based characters are to use alternative encoder which perform a Lossy conversion. In Lossy conversion the accented characters are encoded to the nearest GSM character. For example Á is encoded as A. For more information see Lossy 7 Bit Encoding.

Decoding

Decoding of SMS messages does not require any alternative converter as the existing converter can decode 16 bit Unicode characters. If the incoming message is 7 bit encoded, there is no loss of information.

National Languages

Some of the national languages supported are:

  • Turkish

  • Spanish

  • Portuguese

These national languages can be identified using the enumeration in TSmsNationalLanguageIdentifier.