--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Symbian3/PDK/Source/GUID-C501E703-E39D-598C-B962-7A32AC9091DD.dita Fri Jan 22 18:26:19 2010 +0000
@@ -0,0 +1,105 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
+<!-- This component and the accompanying materials are made available under the terms of the License
+"Eclipse Public License v1.0" which accompanies this distribution,
+and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
+<!-- Initial Contributors:
+ Nokia Corporation - initial contribution.
+Contributors:
+-->
+<!DOCTYPE concept
+ PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="GUID-C501E703-E39D-598C-B962-7A32AC9091DD" xml:lang="en"><title>Folding
+and collation (comparing strings)</title><shortdesc>Describes descriptor folding and descriptor collation.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody>
+<p>There are two techniques that may be used to modify the characters in a
+descriptor prior to performing operations such as comparisons on text strings: </p>
+<ul>
+<li id="GUID-C550B19E-0312-52F4-936C-95C53D4D4FA9"><p>folding </p> </li>
+<li id="GUID-77D0249A-F9B7-5189-808F-C4FB88BED5B3"><p>collation </p> </li>
+</ul>
+<section id="GUID-4AD769A8-A90B-4BE5-B514-DCE9C808C4A8"><title>Folding</title> <p>Folding is a relatively simple way of normalising
+text for comparison by removing case distinctions, converting accented characters
+to characters without accents etc. Folding is used for tolerant comparisons,
+i.e. comparisons that are biased towards a match. </p> <p>For example, the
+file system uses folding to decide whether two file names are identical or
+not. Folding is locale-independent behaviour, and means that the file system,
+for example, can be locale-independent. </p> <p> <i> It is important to note
+that there can be no guarantee that folding is in any way culturally appropriate,
+and should not be used for comparing strings in natural language; </i> <xref href="GUID-C501E703-E39D-598C-B962-7A32AC9091DD.dita#GUID-C501E703-E39D-598C-B962-7A32AC9091DD/GUID-F93D3C40-FDB4-5D92-A90C-736BB0225982">collation</xref> <i>is
+the correct functionality for this.</i> </p> <p>Variants of member functions
+that fold are provided where appropriate. For example, <xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-7BDF7FA1-39FF-35D2-97DE-12A223514345"><apiname>TDesC16::CompareF()</apiname></xref> for
+folded comparison. </p> <p>See also: </p><p><xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-7BDF7FA1-39FF-35D2-97DE-12A223514345"><apiname>TDesC16::CompareF()</apiname></xref><xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-57DED784-A51D-308B-888C-968EFB35B732"><apiname>TDesC16::MatchF()</apiname></xref><xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-D4BDA3FC-E11A-392B-A8A5-B468AC800396"><apiname>TDesC16::FindF()</apiname></xref><xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-F88740FB-C90A-30AF-AA19-E2260EB39A47"><apiname>TDesC16::LocateF()</apiname></xref> <xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-F88740FB-C90A-30AF-AA19-E2260EB39A47"><apiname>TDesC16::LocateF()</apiname></xref> <xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-BE28DE82-AEF1-3E71-A0E1-7A053095B5B0"><apiname>TDesC16::LocateReverseF()</apiname></xref></p> </section>
+<section id="GUID-F93D3C40-FDB4-5D92-A90C-736BB0225982"><title>Collation</title> <p>Collation
+is a much better and more powerful way to compare strings and produces a dictionary-like
+('lexicographic') ordering. Folding cannot remove piece accents or deal with
+correspondences that are not one-to-one like the mapping from German upper
+case SS to lower case ß. In addition, folding cannot optionally ignore punctuation. </p> <p>For
+languages using the Latin script, for example, collation is about deciding
+whether to ignore punctuation, whether to fold upper and lower case, how to
+treat accents, and so on. In a given locale there is usually a standard set
+of collation rules that can be used. </p> <p> <i>Collation should always be
+used for comparing strings in natural language.</i> </p> <p>Variants of member
+functions that use collation are provided where appropriate. For example, <xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-8B44C890-6E64-37CF-B3D9-AEF9EFCBA284"><apiname>TDesC16::CompareC()</apiname></xref> for
+collated comparison. </p> <p><b>Comparing
+and sorting strings</b> </p> <p>The <xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-8B44C890-6E64-37CF-B3D9-AEF9EFCBA284"><apiname>TDesC16::CompareC()</apiname></xref> variant
+prototyped as: </p> <codeblock id="GUID-42E8C509-DA19-50AD-8A80-D381F812639A" xml:space="preserve">TInt CompareC(const TDesC16& aDes, TInt aMaxLevel, const TCollationMethod* aCollationMethod) const;</codeblock> <p>returns 0, if two strings match. </p> <p>There are many ways in which
+two strings can match, even when they do not have the same length: </p> <ul>
+<li id="GUID-8BD9DEF8-3B24-576D-B634-0E38BA2D5859"><p>if one string includes
+combining characters, but the collation level is set to 0 (which means that
+accents are ignored) </p> </li>
+<li id="GUID-999CCABF-D8DC-5967-90F1-7739EDCAEFF9"><p>if one string contains
+"pre-composed" versions of accented characters and the other contains "decomposed"
+versions of the same character </p> </li>
+<li id="GUID-9491E1CF-9BA3-52D6-8996-FC32BB54538A"><p>if one string contains
+a ligature that, in a collation table, matches multiple characters in the
+other string and the collation level is set to less than 3 (for example "æ"
+might match "ae") </p> </li>
+<li id="GUID-2D4FF198-7C41-5838-97FB-8A23A2D3DD49"><p>if one string contains
+a "surrogate pair" (a 32-bit encoded character) which happens to match a normal
+character at the level specified </p> </li>
+<li id="GUID-66B770C7-3215-559A-B70D-68FF3AC9DDAC"><p>if the collation method
+does not have its "ignore none" flag set and the collation level is set to
+less than 3, then spaces and punctuation are ignored; this means that one
+string could be much longer than the other just by adding a large number of
+spaces </p> </li>
+<li id="GUID-33026C64-3740-527A-97E0-DCBB6D7F087D"><p>if one string were to
+contain the Hangul representation of Korean and the other were to contain
+the Jamo representation of the same Korean and the collation level is set
+to less than 3. </p> </li>
+</ul> <p>The collation level is an integer that can take one of the values:
+0, 1, 2 or 3, and determines how tightly the matching of two strings should
+be. This value is passed as the second parameter to <codeph>CompareC()</codeph>.
+The values have the following meanings: </p> <ul>
+<li id="GUID-EDA2DAB4-E6B4-5815-ADC1-1BDF216D0C2E"><p>0 - only test the character
+identity; accents and case are ignored </p> </li>
+<li id="GUID-B0300A5F-6AF5-5722-B538-E0DBDE25576B"><p>1 - test the character
+identity and accents; case is ignored </p> </li>
+<li id="GUID-B6B7E302-120D-5240-B2AE-C14B19A36FF6"><p>2 - test the character
+identity, accents and case </p> </li>
+<li id="GUID-22E919E8-819D-5DE2-BABA-ECCC6F105B3C"><p>3 - test the Unicode
+value as well as the character identity, accents and case. </p> </li>
+</ul> <p>At levels 0-2: </p> <ul>
+<li id="GUID-E08F7C36-2E53-5348-A8D4-6A1E420CEB76"><p>ligatures (e.g. "æ")
+are the same as their decomposed equivalents (e.g. "ae") </p> </li>
+<li id="GUID-937B9898-878A-59BE-B185-A5347038F14C"><p>script variants are
+the same (for example "R" matches the mathematical real number symbol (Unicode
+211D) </p> </li>
+<li id="GUID-6D1AF8EB-E8E8-565E-BA1E-CAED0382D801"><p>the "micro" symbol (Unicode
+00B5) matches Greek "mu" (Unicode 03BC)). </p> </li>
+</ul> <p>At level 3 these are treated differently. </p> <p>If the aim is to <b>sort</b> strings,
+then <b>level 3 must be used</b>. For any strings <codeph>a</codeph> and <codeph>b</codeph>,
+if <codeph>a</codeph> < <codeph>b</codeph> for some level of collation,
+then <codeph> a</codeph> < <codeph>b</codeph> for all higher
+levels of collation as well. It is impossible, therefore, to affect the order
+that is generated by using lower collation levels than 3. This just causes
+similar strings to sort in a random order. In standard English, sorting at
+level 3 gives the following order: </p> <p>bat < bee < BEE < bus </p> <p>The
+case of the B only affects the comparison after all the letter identities
+have been found to be the same - this is usually what people are trying to
+achieve by using lower collation levels than 3 for sorting. It is never necessary. </p> <p>The
+sort order can be affected by setting flags in the <xref href="GUID-78C4965C-BFCD-3E7E-8F46-2EE3D1BAF6EC.dita"><apiname>TCollationMethod</apiname></xref> object. </p> <p>Note
+that when strings match at level 3, they do not necessarily have the same
+binary representation, or even the same length. Unicode contains many strings
+that are regarded as equivalent, even though they have different binary representations. </p><p>
+ See also: </p><p><xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-8B44C890-6E64-37CF-B3D9-AEF9EFCBA284"><apiname>TDesC16::CompareC()</apiname></xref><xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-ACEEA02F-2594-3C61-B7A9-E96F0737C3AE"><apiname>TDesC16::MatchC()</apiname></xref><xref href="GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23.dita#GUID-440FF2B4-353B-3097-A2BA-5887D10B8B23/GUID-33D33034-0757-31F9-B3A2-BA351AADC816"><apiname>TDesC16::FindC()</apiname></xref></p> </section>
+</conbody></concept>
\ No newline at end of file