week 10 bug fix submission (SF PDK version): Bug 1892, Bug 1897, Bug 1319. Also 3 or 4 documents were found to contain code blocks with SFL, which has been fixed. Partial fix for broken links, links to Forum Nokia, and the 'Symbian platform' terminology issues.
<?xml version="1.0" encoding="utf-8"?>
<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
<!-- This component and the accompanying materials are made available under the terms of the License
"Eclipse Public License v1.0" which accompanies this distribution,
and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
<!-- Initial Contributors:
Nokia Corporation - initial contribution.
Contributors:
-->
<!DOCTYPE concept
PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept xml:lang="en" id="GUID-3FAECE72-FCC6-5C58-B724-C9A9DE485788"><title>Converting to and from Transformation Formats Using CnvUtfConverter</title><prolog><metadata><keywords/></metadata></prolog><conbody><p>This section describes the way in which <codeph>CnvUtfConverter</codeph> can be used to convert chunks of text from Unicode to the UTF-7 or UTF-8 transformation format, from the UTF-7 or UTF-8 transformation format to Unicode, and from the UTF-7 or UTF-8 transformation format to Unicode when the text is arriving in fragments. </p> <section><title>Introduction</title> <p>The <codeph>CnvUtfConverter</codeph> class is provided for converting text between Unicode (UCS-2) and the two Unicode transformation formats UTF-7 and UTF-8. The four conversion functions are all static, are <codeph>ConvertFromUnicodeToUtf7()</codeph>, <codeph>ConvertFromUnicodeToUtf8()</codeph>, <codeph>ConvertToUnicodeFromUtf7()</codeph>, and <codeph>ConvertToUnicodeFromUtf8()</codeph>. </p> <p> <b>Note</b>: there is no direct way to convert between UTF-7 and UTF-8. Though it is possible to convert from one transformation format to UCS-2 and then from UCS-2 to the other transformation format. </p> <p>The examples in the below sections describes how to convert to and from UTF-7. Converting to and from UTF-8 is almost identical. The only differences being in the use of a third parameter to <codeph>ConvertFromUnicodeToUtf7()</codeph>, and also that <codeph>ConvertToUnicodeFromUtf8()</codeph> does not use a ‘state’ parameter. </p> </section> <section><title>Converting Unicode to UTF-7</title> <p>The text is converted in chunks, and it is not necessary to guess the full size of the converted output text in advance. </p> <ol id="GUID-481722AF-CD6D-5383-B381-EF9C79416AE0"><li id="GUID-CFB5B1ED-749D-518D-84E7-62E345BD99D1"><p>The function below creates an output descriptor and a ‘remainder’ buffer for holding the unconverted Unicode characters. This remainder buffer is initialised with the text in the input descriptor. </p> <codeblock id="GUID-B19F8A17-A52D-57C6-8801-522408FD78E0" xml:space="preserve">LOCAL_C void EncodeL(const TDesC16& aUnicodeText)
{
// Create a small output buffer
TBuf8<20> outputBuffer;
// Create a buffer for the unconverted text - initialised with the input text
TPtrC16 remainderOfUnicodeText(aUnicodeText);</codeblock> </li> <li id="GUID-FEB17712-19E8-5609-A785-5520CCD3C040"><p>A loop is set up to convert the text in the remainder buffer — which initially contains all the information in the input descriptor. </p> <p> <codeph>ConvertFromUnicodeToUtf7()</codeph> converts characters from the remainder buffer until the small output buffer is full — the Unicode contents of the output buffer are then safely stored. The remainder buffer is reset so that it only contains unconverted text. This process is repeated until the remainder buffer is empty, and the function completes. Note that the code fragment below also includes code to check for corrupted characters. </p> <codeblock id="GUID-DBE3B2D3-BCF9-5D97-9E09-9E925FE8E4BF" xml:space="preserve"> for(;;) // conversion loop
{
// Start conversion. When the output buffer is full, return the
// number of characters that were not converted
const TInt returnValue=CnvUtfConverter::ConvertFromUnicodeToUtf7(outputBuffer,
remainderOfUnicodeText, ETrue);
// check to see that the descriptor isn’t corrupt - leave if it is
if (returnValue==CnvUtfConverter::EErrorIllFormedInput)
User::Leave(KErrCorrupt);
else if (returnValue<0) // future-proof against "TError" expanding
User::Leave(KErrGeneral);
// Do something here to store the contents of the output buffer.
// Finish conversion if there are no unconverted characters in the remainder buffer
if (returnValue==0)
break;
// Remove the converted source text from the remainder buffer.
// The remainder buffer is then fed back into loop
remainderOfUnicodeText.Set(remainderOfUnicodeText.Right(returnValue));
}
}</codeblock> </li> </ol> </section> <section><title>Converting UTF-7 to Unicode</title> <p>The text is converted in chunks so that it is not necessary to guess the full size of the converted output text in advance. The process is similar to that given in the previous example. </p> <ul><li id="GUID-396AAFEB-0A49-5532-ABC4-00EC534C5151"><p>The function below first creates an output descriptor, a state variable, and a ‘remainder’ buffer for holding the unconverted characters — this is initialised with the text in the input descriptor. The state variable is initialised with <codeph>CnvUtfConverter::KStateDefault</codeph> — after initialisation this should not be tampered with, but simply be passed into subsequent calls to <codeph>ConvertToUnicode()</codeph>. </p> <codeblock id="GUID-7B5BED89-3E99-5FBE-BEAC-AD978900A14E" xml:space="preserve">LOCAL_C void DecodeL(const TDesC8& aUtf7)
{
// Create a small output buffer
TBuf16<20> outputBuffer;
// Create a buffer for the unconverted text - initialised with the input text
TPtrC8 remainderOfUtf7(aUtf7);
// Create a "state" variable and initialise it with CnvUtfConverter::KStateDefault
// After initialisation the state variable must not be tampered with.
// Simply pass into each subsequent call of ConvertToUnicodeFromUtf7()
TInt state=CnvUtfConverter::KStateDefault;</codeblock> </li> <li id="GUID-902EA75D-8237-5408-B8BE-7EAEBDE611F2"><p>A loop is then set up to convert the text in the remainder buffer — which initially contains all the information in the input descriptor. </p> <p> <codeph>ConvertToUnicodeFromUtf7()</codeph> converts characters from the remainder buffer until the small output buffer is full — the Unicode contents of the output buffer are then safely stored. The remainder buffer is reset so that it only contains unconverted text. This process is repeated until the remainder buffer is empty, and the function completes. Note that the code fragment below also includes code to check for corrupted characters. </p> <codeblock id="GUID-8AEE6B69-E867-534D-AF83-7A77E1E3F0BD" xml:space="preserve"> for(;;) // conversion loop
{
// Start conversion. When the output buffer is full, return the
// number of characters that were not converted
const TInt returnValue=CnvUtfConverter::ConvertToUnicodeFromUtf7(outputBuffer,remainderOfUtf7,state);
// check to see that the descriptor isn't corrupt - leave if it is
if (returnValue==CnvUtfConverter::EErrorIllFormedInput)
User::Leave(KErrCorrupt);
else if (returnValue<0) // future-proof against "TError" expanding
User::Leave(KErrGeneral);
// Do something here to store the contents of the output buffer.
// Finish conversion if there are no unconverted characters in the remainder buffer
if (returnValue==0)
break;
// Remove the converted source text from the remainder buffer.
// The remainder buffer is then fed back into loop
remainderOfUtf7.Set(remainderOfUtf7.Right(returnValue));
}
}</codeblock> </li> </ul> </section> <section><title>Converting fragmented UTF-7 to Unicode</title> <p>The main difficulty in converting fragmented text is that received chunks may begin or end with bytes from an incomplete character. </p> <p>To overcome this problem, implementations must ensure that the descriptors passed to <codeph>ConvertToUnicodeFromUtf7()</codeph> always begin with a complete character, (making the output descriptor at least 20 elements long should be enough to ensure this) and that conversions only progress to completion for the final chunk of text— in which the last character is complete. In the function below this is achieved by beginning the buffer for each chunk with a small amount of unconverted text from the previous chunk. The buffer is then guaranteed to begin with a complete character. Any ‘loose’ bytes from the end of the last chunk and the beginning of the new one combine to form a complete character. </p> <p>The function first creates a buffer to hold both the unconverted text fragment from the previous chunk, and the new chunk. </p> <codeblock id="GUID-E6FA1A78-EDD9-504E-B4D7-956B306D4705" xml:space="preserve">LOCAL_C void DecodeL()
{
// Create a buffer for holding UTF-7 text to be converted
const TInt KMaximumLengthOfBufferForUtf7=200;
TUint8 bufferForUtf7[KMaximumLengthOfBufferForUtf7];
// Create a variable for indicating the actual amount of text in the buffer
TInt lengthOfBufferForUtf7=0;</codeblock> <p>A loop is then set up to get a new chunk and append it to the unconverted text fragment from the previous chunk. The code also contains a placeholder to find out whether the current chunk is the last chunk, and stores this information in a flag. In addition, it creates an output descriptor, a state variable, and a ‘remainder’ buffer for holding the unconverted text. The state variable is initialised with <codeph>CnvUtfConverter::KStateDefault</codeph> — after initialisation this should not be tampered with, but simply be passed into subsequent calls to <codeph>ConvertToUnicodeFromUtf7()</codeph>. </p> <codeblock id="GUID-BF7937D9-1E25-5B57-9DDB-712D6CAFBB11" xml:space="preserve"> // Outer loop.
// Appends new chunk to fragment from previous chunk
// Then passes the buffer to the conversion loop.
for(;;)
{
// Create a modifiable pointer descriptor for the next chunk of non-Unicode text
TPtr8 nextChunkOfUtf7(bufferForUtf7+lengthOfBufferForUtf7,
KMaximumLengthOfBufferForUtf7-lengthOfBufferForUtf7);
// Insert code to load next chunk of Utf7 here
// Calculate the length of the next chunk of text to be processed
const TInt lengthOfNextChunkOfUtf7=nextChunkOfUtf7.Length();
// Specify the length of the buffer for Utf7 text
lengthOfBufferForUtf7+=lengthOfNextChunkOfUtf7;
// Set whether this is the last chunk - find out from source of text
const TBool isLastChunkOfUtf7= ; // ?
// e.g. the source may define that the last chunk is of length zero, in which case
// the expression "(lengthOfNextChunkOfUtf7==0)" would be assigned to
// this variable; note that even if the length of this chunk is zero,
// dont just exit this function here as bufferForUtf7
// may not be empty (i.e. lengthOfBufferForUtf7>0)
// Create a small output buffer
TBuf16<20> outputBuffer;
// Create a remainder buffer for the unconverted text - used in conversion loop
TPtrC8 remainderOfUtf7(bufferForUtf7, lengthOfBufferForUtf7);
// Create a "state" variable and initialise it with CnvUtfConverter::KStateDefault
// After initialisation the state variable must not be tampered with.
// Simply pass into each subsequent call of ConvertToUnicodeFromUtf7()
TInt state=CnvUtfConverter::KStateDefault;</codeblock> <p>The remainder buffer is passed to the conversion loop. </p> <p> <codeph>ConvertToUnicodeFromUtf7()</codeph> converts characters from the remainder buffer until the output buffer is full — the Unicode contents of the output buffer are then safely stored. Then the remainder buffer is reset so that it only contains unconverted text. This process is repeated until the remainder buffer contains less than 20 bytes. The remainder of the unconverted bytes are copied into the main UTF-7 buffer, and the function returns to the outer loop. The process then repeats itself, with a new chunk being added to the UTF-7 buffer etc. </p> <p>If the ‘last chunk’ flag is set— in the main loop— then the conversion continues until the remainder buffer is empty. The function then completes. The code fragment below also includes code to check for corrupted characters. </p> <codeblock id="GUID-C89BA03E-493A-5F17-8B65-BA6BD92A8368" xml:space="preserve">// The conversion loop. This loop takes chunks of text prepared by the previous loop and converts them
for(;;)
{
const TInt lengthOfRemainderOfUtf7=remainderOfUtf7.Length();
if (isLastChunkOfUtf7)
{
if (lengthOfRemainderOfUtf7==0)
return; // the single point of exit of this function
}
else
{
// As this isn't the last chunk, we don't want ConvertToUnicodeFromUtf7 to return
// CnvUtfConverter::EErrorIllFormedInput if the input descriptor ends with an
// incomplete sequence - but it will only do this if *none* of the input
// descriptor can be consumed. Therefore if the input descriptor is long enough
// (20 elements or longer is adequate) there is no danger of this error
// being returned for this reason. If it's shorter than that, we'll simply put it
// at the start of the buffer so that it gets converted with the next chunk.
if (lengthOfRemainderOfUtf7<20)
{
// put any remaining UTF-7 at the start of bufferForUtf7
lengthOfBufferForUtf7=lengthOfRemainderOfUtf7;
Mem::Copy(bufferForUtf7, remainderOfUtf7.Ptr(), lengthOfBufferForUtf7);
break;
}
}
const TInt returnValue=CnvUtfConverter::ConvertToUnicodeFromUtf7(outputBuffer, remainderOfUtf7, state);
if (returnValue==CnvUtfConverter::EErrorIllFormedInput)
User::Leave(KErrCorrupt);
else if (returnValue<0) // future-proof against "TError" expanding
User::Leave(KErrGeneral);
// Do something here to store the contents of the output buffer.
remainderOfUtf7.Set(remainderOfUtf7.Right(returnValue));
}
}
}</codeblock> </section> </conbody></concept>