Detecting a Character Set (Converter)

When the encoding of a piece of text is not provided you can use the CCnvCharacterSetConverter::AutoDetectCharSetL() function to analyse the text. The function returns the UID for the best available character set (converter).

Introduction

AutoDetectCharSetL() loops through the available character sets and checks whether each character set is a plug-in. If it is, it calls the IsInThisCharacterSetL() function of the plug-in to get a confidence level.

Confidence levels are in the range 0 to 100 (inclusive) where 0 means "I have no idea” and 100 means "I have total confidence that this is the correct character set”. If the confidence level is 0 the character set identifier is not defined.

Once the entire array of character sets has been tested, the character set with the highest confidence level is returned as the character set encoding for the sample text.

Procedure

  1. Create and populate an array of available character sets.

             
              
             
             RFs fileServerSession;
    
    CleanupClosePushL(fileServerSession);
    
    User::LeaveIfError(fileServerSession.Connect());
    
    CCnvCharacterSetConverter* characterSetConverter=CCnvCharacterSetConverter::NewLC();
    
    CArrayFix<CCnvCharacterSetConverter::SCharacterSet>* arrayOfCharacterSetsAvailable = 
        characterSetConverter-> CreateArrayOfCharacterSetsAvailableLC(fileServerSession);
    ...
            
  2. Call the AutoDetectCharSetL() function to get the character converter information.

    The sample text is ASCII encoded plain text. The confidence level–100 and the characterSetID are returned.

             
              
             
             _LIT8(KASCII, “The result I am expecting is that this is recognised as ASCII!");
    
    TInt confidenceLevel = 0;
    
    TUint characterSetID = 0;
    
    characterSetConverter->AutoDetectCharSetL (
                                     confidenceLevel,
                                     characterSetID,
                                     *arrayOfCharacterSetsAvailable,
                                     KASCII);
    ...
            

The value in characterSetID can then be passed to the CCnvCharacterSetConverter to do the conversion.