43 |
43 |
44 /*! |
44 /*! |
45 @stable |
45 @stable |
46 @hbcore |
46 @hbcore |
47 \class HbStringUtil |
47 \class HbStringUtil |
48 \brief The HbStringUtil class can be used to execute operations on strings, |
48 \brief The HbStringUtil class supports locale-based comparing and ordering of strings, and digit conversion. |
49 such as comparisons and finding data sequences. |
49 |
50 |
50 \section _hbstringutil_stringcomparison Comparing and ordering strings |
51 \sa HbStringUtil |
51 |
|
52 Using HbStringUtil, you can execute locale-aware operations on collated |
|
53 strings, such as comparisons and finding data sequences in target strings. |
|
54 HbStringUtil also provides similar functions for operations on folded data. |
|
55 |
|
56 Choose the right HbStringUtil functions for your purpose, depending on |
|
57 whether you want to operate on collated or folded data: |
|
58 |
|
59 <ul> |
|
60 |
|
61 <li>Because different languages have different alphabets, they also require |
|
62 different ways of ordering strings. The technical term for ordering and |
|
63 comparing strings in a locale-specific manner is \b collating. When |
|
64 comparing strings in natural languages, it is recommended that you use the |
|
65 HbStringUtil collation functions: matchC(), compareC() and findC(), and |
|
66 sort() for operations on multiple strings. |
|
67 |
|
68 For example, for languages using the Latin script, collation defines when |
|
69 punctuation should be ignored, how accents are handled, and so on. A |
|
70 locale's settings usually include rules for collation. |
|
71 |
|
72 As an example of handling accents, in German, the letter "o with umlaut" or |
|
73 "ö" is just a variation of "o", whereas in Swedish it is the letter "ö", |
|
74 which comes last in the alphabet. There can also be country-specific |
|
75 variations within a language, for example, between Iberian Spanish and Latin |
|
76 American Spanish.</li> |
|
77 |
|
78 <li>\b Folding normalises text for comparison, for example, by removing case |
|
79 distinctions and accents from characters. This can be useful, for example, |
|
80 if you need to determine if two file names are identical. |
|
81 |
|
82 Folding is not locale-aware. For example, folding is not able to handle |
|
83 character relationships that are not one-to-one, for example, mapping from |
|
84 the German uppercase SS to the lowercase ß. When you need to compare |
|
85 strings in natural languages, use collation instead.</li> |
|
86 |
|
87 </ul> |
|
88 |
|
89 \section _hbstringutil_digitconversion Digit conversion |
|
90 |
|
91 Internal processing of numeric expressions uses Latin digits. To display |
|
92 digits correctly to the user, you need to convert them to the appropriate |
|
93 digit type. |
|
94 |
|
95 If you are inserting numbers into a localizable string at runtime using |
|
96 arguments, you can localize the digit type used for the numbers using the \c |
|
97 L notation in the placeholder in the string (see hbTrId()). However, this |
|
98 only works for integers. When displaying more complex numeric expressions |
|
99 such as dates and times, use HbStringUtil to convert the digits to the digit |
|
100 type appropriate to the UI language. |
|
101 |
|
102 \note In internal processing, converting numbers from one digit type to |
|
103 another is not always allowed. For example, native digits are filtered out |
|
104 of IP addresses, and native digits in phone numbers are converted and sent |
|
105 to the network as Latin digits. This is because different networks may not |
|
106 be able to handle native digits. |
|
107 |
|
108 HbExtendedLocale, QLocale and HbNumberGrouping also provide functions for |
|
109 locale-dependent number formatting. |
|
110 |
|
111 \sa HbLocaleUtil |
52 */ |
112 */ |
53 |
113 |
54 /*! |
114 /*! |
55 \enum HbStringUtil::Option |
115 \enum HbStringUtil::Option |
56 |
116 |
57 This enum describes the way collation is done for matchC, compareC |
117 Defines the collation levels ('flags') for matchC() and compareC(). This is |
58 Pass one of these values to setReadChannel() to set the |
118 based on the Symbian TCollationMethod enumeration, and it is not used on |
59 current read channel of QProcess. |
119 other platforms. |
60 |
120 |
61 \value Default Use the default System flags. |
121 */ |
62 |
122 |
63 \value IgnoreNone Don't ignore anything. |
123 /*! |
64 |
124 \var HbStringUtil::Default |
65 \value SwapCase Reverse case ordering. |
125 Use the default system flags. |
66 |
126 |
67 \value AccentsBackwards Compare secondary keys which represent accents in reverse order. |
127 */ |
68 |
128 |
69 \value SwapKana Reverse order for katakana/hiragana. |
129 /*! |
70 |
130 \var HbStringUtil::IgnoreNone |
71 \value FoldCase Fold to lower case, file comparisons. |
131 |
72 |
132 Do not ignore any keys (by default, for example, punctuation marks and |
73 \value MatchingTable Table used for matching. |
133 spaces are ignored). |
74 |
134 |
75 \value IgnoreCombining Ignore check for adjacent combining characters. |
135 */ |
76 |
136 |
77 \sa compareC, matchC |
137 /*! |
78 */ |
138 \var HbStringUtil::SwapCase |
79 |
139 |
80 /*! |
140 Reverse the normal order for characters that only differ in case. |
81 Searches source string's collated data for a |
141 |
82 match with collated data supplied in pattern string |
142 |
|
143 */ |
|
144 |
|
145 /*! |
|
146 \var HbStringUtil::AccentsBackwards |
|
147 |
|
148 Compare secondary keys representing accents in reverse order (from right to |
|
149 left). This is needed for French when comparing words that only differ in |
|
150 accents. |
|
151 |
|
152 |
|
153 */ |
|
154 |
|
155 /*! |
|
156 \var HbStringUtil::SwapKana |
|
157 |
|
158 Reverse the normal order for characters that only differ in whether they are |
|
159 Katakana or Hiragana. |
|
160 |
|
161 |
|
162 */ |
|
163 |
|
164 /*! |
|
165 \var HbStringUtil::FoldCase |
|
166 |
|
167 Fold all characters to lowercase before extracting keys. This is needed when |
|
168 comparing file names; case is ignored but other Unicode collation level 2 |
|
169 distinctions are not. |
|
170 |
|
171 */ |
|
172 |
|
173 /*! |
|
174 \var HbStringUtil::MatchingTable |
|
175 |
|
176 Specify a specific collation method to be used for matching purposes. |
|
177 |
|
178 */ |
|
179 |
|
180 /*! |
|
181 \var HbStringUtil::IgnoreCombining |
|
182 |
|
183 Ignore a check for adjacent combining characters. A combining character |
|
184 effectively changes the character it combines with into another character, |
|
185 which means a match does not occur. Setting this flag allows character |
|
186 matching regardless of any combining characters. |
|
187 |
|
188 */ |
|
189 |
|
190 /*! |
83 |
191 |
84 \attention Cross-Platform API |
192 Searches \a strFrom for a match of \a strToMatch based on the locale's |
85 |
193 collation settings. You can optionally specify the level of collation with |
86 \param strFrom Source string. |
194 \a maxLevel and \a flags, and the wild card and escape characters for the |
87 \param strToMatch Pattern string. |
195 search with \a wildChar, \a wildSequenceChar and \a escapeChar. If the |
88 \param maxLevel Determines the tightness of the collation. |
196 parameters are not specified, the default values are used. |
89 Level 0 - Character identity; |
197 |
90 Level 1 - Character identity and accents; |
198 \param strFrom The source string. |
91 Level 2 - Character identity, accents and case; |
199 \param strToMatch The string whose data is to be searched within the source |
92 Level 3 - Character identity, accents, case and Unicode value; |
200 string. The value can contain the wildcard characters "*" and "?", where "*" |
93 \param flags The flags that will be used. Default value is Default. |
201 matches zero or more consecutive occurrences of any character, and "?" |
94 \param wildChar Wild card character. |
202 matches a single occurrence of any character (default). |
95 \param wildSequenceChar Wild card sequence character. |
203 \param maxLevel (optional) The level of collation. Possible values: |
96 \param escapeChar The escape character, for example, '?', '*'. |
204 - \c 0: Only character identities are distinguished. |
97 |
205 - \c 1: Character identities and accents are distinguished. |
98 \return If a match is found the offset within source string's |
206 - \c 2: Character identities, accents, and case are distinguished. |
99 data where the match first occurs. -1 if match is not found. |
207 - \c 3: All valid Unicode characters are considered different. |
|
208 - For details, see Symbian documentation on collation, for example, as used |
|
209 by TDesC16::MatchC(). |
|
210 \param flags A list of (comma-separated) HbStringUtil::Option flags that |
|
211 will be used. The default value is \c Default. |
|
212 \param wildChar (optional) Wild card character ('?' by default). |
|
213 \param wildSequenceChar (optional) Wild card sequence character ('*' by default). |
|
214 \param escapeChar (optional) The escape character, for example, '?', '*' or '\\\\' (default). |
|
215 |
|
216 \return The offset from the beginning of \a strFrom where the match first |
|
217 occurs. If the data sequence in \a strToMatch is not found, returns -1. |
100 |
218 |
101 Example: |
219 Example: |
102 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,3} |
220 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,3} |
|
221 |
|
222 \attention On the Symbian platform, this class uses a Symbian-specific |
|
223 collation match. On other platforms, the search is not locale-based, and |
|
224 only the \a strFrom and \a strToMatch parameters are used. |
|
225 |
|
226 \sa findC() |
103 */ |
227 */ |
104 int HbStringUtil::matchC( const QString &strFrom, const QString &strToMatch, |
228 int HbStringUtil::matchC( const QString &strFrom, const QString &strToMatch, |
105 int maxLevel, Options flags, |
229 int maxLevel, Options flags, |
106 int wildChar, int wildSequenceChar, int escapeChar ) |
230 int wildChar, int wildSequenceChar, int escapeChar ) |
107 { |
231 { |
108 #if defined( Q_OS_SYMBIAN ) |
232 #if defined( Q_OS_SYMBIAN ) |
109 TPtrC s1Ptr( strFrom.utf16() ); |
233 TPtrC s1Ptr( strFrom.utf16() ); |
110 TPtrC s2Ptr( strToMatch.utf16() ); |
234 TPtrC s2Ptr( strToMatch.utf16() ); |
111 |
235 |
112 if ( (maxLevel < 0) || (maxLevel > 3) ) { |
236 if ( (maxLevel < 0) || (maxLevel > 3) ) { |
113 maxLevel = 0; |
237 maxLevel = 0; |
114 } |
238 } |
115 if ( (flags < 0) || (flags > 127) ) { |
239 if ( (flags < 0) || (flags > 127) ) { |
116 flags = Default; |
240 flags = Default; |
117 } |
241 } |
118 |
242 |
119 TCollationMethod m = *Mem::GetDefaultMatchingTable(); |
243 TCollationMethod m = *Mem::GetDefaultMatchingTable(); |
128 Q_UNUSED(escapeChar); |
252 Q_UNUSED(escapeChar); |
129 #ifdef QT_NO_REGEXP |
253 #ifdef QT_NO_REGEXP |
130 // if no regular expressions defined do standard MatchF |
254 // if no regular expressions defined do standard MatchF |
131 return strFrom.indexOf( strToMatch, 0, Qt::CaseSensitive ); |
255 return strFrom.indexOf( strToMatch, 0, Qt::CaseSensitive ); |
132 #else |
256 #else |
133 // works with standard wildcards is not correct |
257 // works with standard wildcards |
134 QRegExp locStrToMatch( strToMatch, Qt::CaseSensitive, QRegExp::Wildcard ); |
258 QRegExp locStrToMatch( strToMatch, Qt::CaseSensitive, QRegExp::Wildcard ); |
135 return strFrom.indexOf( locStrToMatch, 0 ); |
259 return strFrom.indexOf( locStrToMatch, 0 ); |
136 #endif |
260 #endif |
137 |
261 |
138 #endif |
262 #endif |
139 } |
263 } |
140 |
264 |
141 /*! |
265 /*! |
142 Compares source string's data with the other string's |
266 |
143 data using the specified collation method. |
267 Compares \a string1 with \a string2 based on the locale's collation |
144 |
268 settings. You can optionally specify the level of collation with \a maxLevel |
145 \attention Cross-Platform API |
269 and \a flags. If the parameters are not specified, the default values are |
146 |
270 used. |
147 \param string1 Source string. |
271 |
148 \param string2 String whose data is to be compared with the source string. |
272 \param string1 The source string. |
149 \param maxLevel Maximum level to use for comparing. |
273 \param string2 The string whose data is to be compared with the source string. |
150 Level 0 - Character identity; |
274 \param maxLevel (optional) The level of collation. Possible values: |
151 Level 1 - Character identity and accents; |
275 - \c 0: Only character identities are distinguished. |
152 Level 2 - Character identity, accents and case; |
276 - \c 1: Character identities and accents are distinguished. |
153 Level 3 - Character identity, accents, case and Unicode value; |
277 - \c 2: Character identities, accents, and case are distinguished. |
154 \param flags The flags that will be used. Default value is Default. |
278 - \c 3: All valid Unicode characters are considered different (default). |
155 |
279 - For details, see Symbian documentation on collation, for example, as used |
156 \return Positive if source string is greater, negative if it is less and |
280 by TDesC16::CompareC(). |
157 zero if the content of both strings match. |
281 \param flags (optional) A list of (comma-separated) HbStringUtil::Option flags that |
|
282 will be used. The default value is \c Default. |
|
283 |
|
284 \return Positive if the \a string1 is greater (that is, comes after \a |
|
285 string2 when the strings are ordered), negative if \a string2 is greater, |
|
286 and zero if the content of the strings matches. |
158 |
287 |
159 Example: |
288 Example: |
160 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,1} |
289 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,1} |
|
290 |
|
291 \attention Locale-specific collation settings are used, and the return value |
|
292 may vary on different platforms. The \a maxLevel and \a flags parameters are |
|
293 not used. |
|
294 |
|
295 \sa compareF() |
161 */ |
296 */ |
162 int HbStringUtil::compareC( const QString &string1, const QString &string2, |
297 int HbStringUtil::compareC( const QString &string1, const QString &string2, |
163 int maxLevel, Options flags ) |
298 int maxLevel, Options flags ) |
164 { |
299 { |
165 #if defined( Q_OS_SYMBIAN ) |
300 #if defined( Q_OS_SYMBIAN ) |
166 TPtrC s1Ptr(string1.utf16()); |
301 TPtrC s1Ptr(string1.utf16()); |
167 TPtrC s2Ptr(string2.utf16()); |
302 TPtrC s2Ptr(string2.utf16()); |
168 |
303 |
169 if ( (maxLevel < 0) || (maxLevel > 3) ) { |
304 if ( (maxLevel < 0) || (maxLevel > 3) ) { |
170 maxLevel = 3; |
305 maxLevel = 3; |
171 } |
306 } |
172 if ( (flags < 0) || (flags > 127) ) { |
307 if ( (flags < 0) || (flags > 127) ) { |
173 flags = Default; |
308 flags = Default; |
174 } |
309 } |
175 |
310 |
176 TCollationMethod m = *Mem::CollationMethodByIndex( 0 ); |
311 TCollationMethod m = *Mem::CollationMethodByIndex( 0 ); |
183 return string1.localeAwareCompare( string2 ); |
318 return string1.localeAwareCompare( string2 ); |
184 #endif |
319 #endif |
185 } |
320 } |
186 |
321 |
187 /*! |
322 /*! |
188 Searches for the first occurrence of the specified collated |
323 |
189 data sequence in the aStrFrom to the specified maximum |
324 Searches \a strFrom for the first occurrence of \a strToFind based on the |
190 collation level. |
325 locale's collation settings. You can optionally specify the collation level |
191 |
326 with \a maxLevel. If the parameter is not specified, the default value is |
192 \attention Cross-Platform API |
327 used. |
193 |
328 |
194 \param strFrom Source string. |
329 \param strFrom The source string. |
195 \param strToFind String whose data is to be compared with the source string. |
330 \param strToFind The string whose data is to be searched within the source string. |
196 \param maxLevel The maximum collation level. |
331 \param maxLevel (optional) The level of collation. Possible values: |
197 Level 0 - Character identity; |
332 - \c 0: Only character identities are distinguished (default). |
198 Level 1 - Character identity and accents; |
333 - \c 1: Character identities and accents are distinguished. |
199 Level 2 - Character identity, accents and case; |
334 - \c 2: Character identities, accents, and case are distinguished. |
200 Level 3 - Character identity, accents, case and Unicode value; |
335 - \c 3: All valid Unicode characters are considered different. |
201 |
336 - For details, see Symbian documentation on collation, for example, as used |
202 \return Offset of the data sequence from the beginning of the |
337 by TDesC16::FindC(). |
203 aStrFrom. -1 if the data sequence cannot be found. |
338 |
|
339 \return The offset from the beginning of \a strFrom where the match first |
|
340 occurs. If the data sequence in \a strToFind cannot be found, returns -1. If |
|
341 the length of \a strToFind is zero, returns zero. |
204 |
342 |
205 Example: |
343 Example: |
206 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,5} |
344 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,5} |
|
345 |
|
346 \attention On the Symbian platform, this class uses a Symbian-specific |
|
347 collation search. On other platforms, the search is not locale-based, and |
|
348 the \a maxLevel parameter is not used. |
|
349 |
|
350 \sa matchC() |
207 */ |
351 */ |
208 int HbStringUtil::findC( const QString &strFrom, |
352 int HbStringUtil::findC( const QString &strFrom, |
209 const QString &strToFind, |
353 const QString &strToFind, |
210 int maxLevel ) |
354 int maxLevel ) |
211 { |
355 { |
212 #if defined( Q_OS_SYMBIAN ) |
356 #if defined( Q_OS_SYMBIAN ) |
213 TPtrC s1Ptr( strFrom.utf16() ); |
357 TPtrC s1Ptr( strFrom.utf16() ); |
214 TPtrC s2Ptr( strToFind.utf16() ); |
358 TPtrC s2Ptr( strToFind.utf16() ); |
215 |
359 |
216 if ( (maxLevel < 0) || (maxLevel > 3) ) { |
360 if ( (maxLevel < 0) || (maxLevel > 3) ) { |
217 maxLevel = 0; |
361 maxLevel = 0; |
218 } |
362 } |
219 return s1Ptr.FindC( s2Ptr.Ptr(), |
363 return s1Ptr.FindC( s2Ptr.Ptr(), |
220 s2Ptr.Length(), |
364 s2Ptr.Length(), |
221 maxLevel ); |
365 maxLevel ); |
222 #else |
366 #else |
223 Q_UNUSED(maxLevel); |
367 Q_UNUSED(maxLevel); |
224 return strFrom.indexOf( strToFind, 0, Qt::CaseSensitive ); |
368 return strFrom.indexOf( strToFind, 0, Qt::CaseSensitive ); |
225 #endif |
369 #endif |
226 } |
370 } |
227 |
371 |
228 /*! |
372 /*! |
229 Searches source string's folded data for a |
373 |
230 match with folded data supplied in pattern string |
374 Searches the folded data in \a strFrom for a match with the folded data in |
231 |
375 \a strToMatch. |
232 \attention Cross-Platform API |
376 |
233 |
377 \param strFrom The source string. |
234 \param strFrom Source string. |
378 \param strToMatch The string whose data is to be searched within the source |
235 \param strToMatch Pattern string. |
379 string. The value can contain the wildcard characters "*" and "?", where "*" |
236 |
380 matches zero or more consecutive occurrences of any character, and "?" |
237 \return If a match is found the offset within source string's |
381 matches a single occurrence of any character. |
238 data where the match first occurs. -1 if match is not found. |
382 |
|
383 \return The offset from the beginning of \a strFrom where the match first |
|
384 occurs. If the data sequence in \a strToMatch is not found, returns -1. |
239 |
385 |
240 Example: |
386 Example: |
241 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,4} |
387 \snippet{unittest_hbstringutil/unittest_hbstringutil.cpp,4} |
|
388 |
|
389 \attention On the Symbian platform, this class uses a Symbian-specific |
|
390 folding match. On other platforms, the search is not locale-based. |
|
391 |
|
392 \sa findF() |
242 */ |
393 */ |
243 int HbStringUtil::matchF( const QString &strFrom, |
394 int HbStringUtil::matchF( const QString &strFrom, |
244 const QString &strToMatch ) |
395 const QString &strToMatch ) |
245 { |
396 { |
246 #if defined( Q_OS_SYMBIAN ) |
397 #if defined( Q_OS_SYMBIAN ) |
333 } |
495 } |
334 return ch; |
496 return ch; |
335 } |
497 } |
336 |
498 |
337 /*! |
499 /*! |
338 Converts digits to native digits based on current UI language. |
500 |
339 |
501 Converts digits to the native digits based on the current UI language. |
340 \attention Cross-Platform API |
502 |
341 |
503 \param str The digits to be converted. |
342 \param str digits to be converted. |
504 |
|
505 For example: |
|
506 |
|
507 \code |
|
508 QString date = "07.09.2010"; |
|
509 QString result = HbStringUtil::convertDigits(date); |
|
510 \endcode |
|
511 |
|
512 \sa convertDigitsTo(), QLocale::toString() |
343 */ |
513 */ |
344 QString HbStringUtil::convertDigits( const QString str ) |
514 QString HbStringUtil::convertDigits( const QString str ) |
345 { |
515 { |
346 HbExtendedLocale locale = HbExtendedLocale::system(); |
516 DigitType digitType = WesternDigit; |
347 DigitType digitType = WesternDigit; |
517 #if defined( Q_OS_SYMBIAN ) |
348 if (locale.language() == HbExtendedLocale::Arabic) { |
518 TExtendedLocale extLocale; |
349 digitType = ArabicIndicDigit; |
519 extLocale.LoadSystemSettings(); |
350 } else if (locale.language() == HbExtendedLocale::Persian || locale.language() == HbExtendedLocale::Urdu) { |
520 TDigitType type = extLocale.GetLocale()->DigitType(); |
351 digitType = EasternArabicIndicDigit; |
521 switch (type) |
352 } |
522 { |
353 QString converted = HbStringUtil::convertDigitsTo(str, digitType); |
523 case EDigitTypeArabicIndic: |
354 return converted; |
524 digitType = ArabicIndicDigit; |
355 } |
525 break; |
356 |
526 case EDigitTypeEasternArabicIndic: |
357 /*! |
527 digitType = EasternArabicIndicDigit; |
358 Converts the digit from Latin to native or native to latin or native to native |
528 break; |
359 |
529 default: |
360 \attention Cross-Platform API |
530 break; |
361 |
531 }; |
362 \param str digits to be converted. |
532 #else |
363 \param digitType type of the digit to be converted to |
533 HbExtendedLocale locale = HbExtendedLocale::system(); |
|
534 QChar zero = locale.zeroDigit(); |
|
535 if (zero == 0x660) { |
|
536 digitType = ArabicIndicDigit; |
|
537 } |
|
538 if (zero == 0x6F0) { |
|
539 digitType = EasternArabicIndicDigit; |
|
540 } |
|
541 #endif |
|
542 QString converted = HbStringUtil::convertDigitsTo(str, digitType); |
|
543 return converted; |
|
544 } |
|
545 |
|
546 /*! |
|
547 |
|
548 Returns digits converted into the specified digit type. If you need to |
|
549 process an integer, use QLocale::toString() to first convert it to a string. |
|
550 |
|
551 \param str The digits to be converted. |
|
552 \param digitType The digit type that the given digits are to be converted to. Possible values: |
|
553 - \c WesternDigit - Latin digits |
|
554 - \c ArabicIndicDigit - Arabic-Indic digits |
|
555 - \c EasternArabicIndicDigit - Eastern Arabic-Indic digits |
|
556 - \c DevanagariDigit - Devanagari digits |
|
557 |
|
558 \sa convertDigits(), QLocale::toString() |
364 */ |
559 */ |
365 QString HbStringUtil::convertDigitsTo( const QString str, const DigitType digitType ) |
560 QString HbStringUtil::convertDigitsTo( const QString str, const DigitType digitType ) |
366 { |
561 { |
367 QString convDigit; |
562 QString convDigit; |
368 int length = str.length(); |
563 int length = str.length(); |