|
1 :mod:`string` --- Common string operations |
|
2 ========================================== |
|
3 |
|
4 .. module:: string |
|
5 :synopsis: Common string operations. |
|
6 |
|
7 |
|
8 .. index:: module: re |
|
9 |
|
10 The :mod:`string` module contains a number of useful constants and |
|
11 classes, as well as some deprecated legacy functions that are also |
|
12 available as methods on strings. In addition, Python's built-in string |
|
13 classes support the sequence type methods described in the |
|
14 :ref:`typesseq` section, and also the string-specific methods described |
|
15 in the :ref:`string-methods` section. To output formatted strings use |
|
16 template strings or the ``%`` operator described in the |
|
17 :ref:`string-formatting` section. Also, see the :mod:`re` module for |
|
18 string functions based on regular expressions. |
|
19 |
|
20 |
|
21 String constants |
|
22 ---------------- |
|
23 |
|
24 The constants defined in this module are: |
|
25 |
|
26 |
|
27 .. data:: ascii_letters |
|
28 |
|
29 The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase` |
|
30 constants described below. This value is not locale-dependent. |
|
31 |
|
32 |
|
33 .. data:: ascii_lowercase |
|
34 |
|
35 The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``. This value is not |
|
36 locale-dependent and will not change. |
|
37 |
|
38 |
|
39 .. data:: ascii_uppercase |
|
40 |
|
41 The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. This value is not |
|
42 locale-dependent and will not change. |
|
43 |
|
44 |
|
45 .. data:: digits |
|
46 |
|
47 The string ``'0123456789'``. |
|
48 |
|
49 |
|
50 .. data:: hexdigits |
|
51 |
|
52 The string ``'0123456789abcdefABCDEF'``. |
|
53 |
|
54 |
|
55 .. data:: letters |
|
56 |
|
57 The concatenation of the strings :const:`lowercase` and :const:`uppercase` |
|
58 described below. The specific value is locale-dependent, and will be updated |
|
59 when :func:`locale.setlocale` is called. |
|
60 |
|
61 |
|
62 .. data:: lowercase |
|
63 |
|
64 A string containing all the characters that are considered lowercase letters. |
|
65 On most systems this is the string ``'abcdefghijklmnopqrstuvwxyz'``. Do not |
|
66 change its definition --- the effect on the routines :func:`upper` and |
|
67 :func:`swapcase` is undefined. The specific value is locale-dependent, and will |
|
68 be updated when :func:`locale.setlocale` is called. |
|
69 |
|
70 |
|
71 .. data:: octdigits |
|
72 |
|
73 The string ``'01234567'``. |
|
74 |
|
75 |
|
76 .. data:: punctuation |
|
77 |
|
78 String of ASCII characters which are considered punctuation characters in the |
|
79 ``C`` locale. |
|
80 |
|
81 |
|
82 .. data:: printable |
|
83 |
|
84 String of characters which are considered printable. This is a combination of |
|
85 :const:`digits`, :const:`letters`, :const:`punctuation`, and |
|
86 :const:`whitespace`. |
|
87 |
|
88 |
|
89 .. data:: uppercase |
|
90 |
|
91 A string containing all the characters that are considered uppercase letters. |
|
92 On most systems this is the string ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. Do not |
|
93 change its definition --- the effect on the routines :func:`lower` and |
|
94 :func:`swapcase` is undefined. The specific value is locale-dependent, and will |
|
95 be updated when :func:`locale.setlocale` is called. |
|
96 |
|
97 |
|
98 .. data:: whitespace |
|
99 |
|
100 A string containing all characters that are considered whitespace. On most |
|
101 systems this includes the characters space, tab, linefeed, return, formfeed, and |
|
102 vertical tab. Do not change its definition --- the effect on the routines |
|
103 :func:`strip` and :func:`split` is undefined. |
|
104 |
|
105 |
|
106 .. _new-string-formatting: |
|
107 |
|
108 String Formatting |
|
109 ----------------- |
|
110 |
|
111 Starting in Python 2.6, the built-in str and unicode classes provide the ability |
|
112 to do complex variable substitutions and value formatting via the |
|
113 :meth:`str.format` method described in :pep:`3101`. The :class:`Formatter` |
|
114 class in the :mod:`string` module allows you to create and customize your own |
|
115 string formatting behaviors using the same implementation as the built-in |
|
116 :meth:`format` method. |
|
117 |
|
118 .. class:: Formatter |
|
119 |
|
120 The :class:`Formatter` class has the following public methods: |
|
121 |
|
122 .. method:: format(format_string, *args, *kwargs) |
|
123 |
|
124 :meth:`format` is the primary API method. It takes a format template |
|
125 string, and an arbitrary set of positional and keyword argument. |
|
126 :meth:`format` is just a wrapper that calls :meth:`vformat`. |
|
127 |
|
128 .. method:: vformat(format_string, args, kwargs) |
|
129 |
|
130 This function does the actual work of formatting. It is exposed as a |
|
131 separate function for cases where you want to pass in a predefined |
|
132 dictionary of arguments, rather than unpacking and repacking the |
|
133 dictionary as individual arguments using the ``*args`` and ``**kwds`` |
|
134 syntax. :meth:`vformat` does the work of breaking up the format template |
|
135 string into character data and replacement fields. It calls the various |
|
136 methods described below. |
|
137 |
|
138 In addition, the :class:`Formatter` defines a number of methods that are |
|
139 intended to be replaced by subclasses: |
|
140 |
|
141 .. method:: parse(format_string) |
|
142 |
|
143 Loop over the format_string and return an iterable of tuples |
|
144 (*literal_text*, *field_name*, *format_spec*, *conversion*). This is used |
|
145 by :meth:`vformat` to break the string in to either literal text, or |
|
146 replacement fields. |
|
147 |
|
148 The values in the tuple conceptually represent a span of literal text |
|
149 followed by a single replacement field. If there is no literal text |
|
150 (which can happen if two replacement fields occur consecutively), then |
|
151 *literal_text* will be a zero-length string. If there is no replacement |
|
152 field, then the values of *field_name*, *format_spec* and *conversion* |
|
153 will be ``None``. |
|
154 |
|
155 .. method:: get_field(field_name, args, kwargs) |
|
156 |
|
157 Given *field_name* as returned by :meth:`parse` (see above), convert it to |
|
158 an object to be formatted. Returns a tuple (obj, used_key). The default |
|
159 version takes strings of the form defined in :pep:`3101`, such as |
|
160 "0[name]" or "label.title". *args* and *kwargs* are as passed in to |
|
161 :meth:`vformat`. The return value *used_key* has the same meaning as the |
|
162 *key* parameter to :meth:`get_value`. |
|
163 |
|
164 .. method:: get_value(key, args, kwargs) |
|
165 |
|
166 Retrieve a given field value. The *key* argument will be either an |
|
167 integer or a string. If it is an integer, it represents the index of the |
|
168 positional argument in *args*; if it is a string, then it represents a |
|
169 named argument in *kwargs*. |
|
170 |
|
171 The *args* parameter is set to the list of positional arguments to |
|
172 :meth:`vformat`, and the *kwargs* parameter is set to the dictionary of |
|
173 keyword arguments. |
|
174 |
|
175 For compound field names, these functions are only called for the first |
|
176 component of the field name; Subsequent components are handled through |
|
177 normal attribute and indexing operations. |
|
178 |
|
179 So for example, the field expression '0.name' would cause |
|
180 :meth:`get_value` to be called with a *key* argument of 0. The ``name`` |
|
181 attribute will be looked up after :meth:`get_value` returns by calling the |
|
182 built-in :func:`getattr` function. |
|
183 |
|
184 If the index or keyword refers to an item that does not exist, then an |
|
185 :exc:`IndexError` or :exc:`KeyError` should be raised. |
|
186 |
|
187 .. method:: check_unused_args(used_args, args, kwargs) |
|
188 |
|
189 Implement checking for unused arguments if desired. The arguments to this |
|
190 function is the set of all argument keys that were actually referred to in |
|
191 the format string (integers for positional arguments, and strings for |
|
192 named arguments), and a reference to the *args* and *kwargs* that was |
|
193 passed to vformat. The set of unused args can be calculated from these |
|
194 parameters. :meth:`check_unused_args` is assumed to throw an exception if |
|
195 the check fails. |
|
196 |
|
197 .. method:: format_field(value, format_spec) |
|
198 |
|
199 :meth:`format_field` simply calls the global :func:`format` built-in. The |
|
200 method is provided so that subclasses can override it. |
|
201 |
|
202 .. method:: convert_field(value, conversion) |
|
203 |
|
204 Converts the value (returned by :meth:`get_field`) given a conversion type |
|
205 (as in the tuple returned by the :meth:`parse` method.) The default |
|
206 version understands 'r' (repr) and 's' (str) conversion types. |
|
207 |
|
208 |
|
209 .. _formatstrings: |
|
210 |
|
211 Format String Syntax |
|
212 -------------------- |
|
213 |
|
214 The :meth:`str.format` method and the :class:`Formatter` class share the same |
|
215 syntax for format strings (although in the case of :class:`Formatter`, |
|
216 subclasses can define their own format string syntax.) |
|
217 |
|
218 Format strings contain "replacement fields" surrounded by curly braces ``{}``. |
|
219 Anything that is not contained in braces is considered literal text, which is |
|
220 copied unchanged to the output. If you need to include a brace character in the |
|
221 literal text, it can be escaped by doubling: ``{{`` and ``}}``. |
|
222 |
|
223 The grammar for a replacement field is as follows: |
|
224 |
|
225 .. productionlist:: sf |
|
226 replacement_field: "{" `field_name` ["!" `conversion`] [":" `format_spec`] "}" |
|
227 field_name: (`identifier` | `integer`) ("." `attribute_name` | "[" element_index "]")* |
|
228 attribute_name: `identifier` |
|
229 element_index: `integer` |
|
230 conversion: "r" | "s" |
|
231 format_spec: <described in the next section> |
|
232 |
|
233 In less formal terms, the replacement field starts with a *field_name*, which |
|
234 can either be a number (for a positional argument), or an identifier (for |
|
235 keyword arguments). Following this is an optional *conversion* field, which is |
|
236 preceded by an exclamation point ``'!'``, and a *format_spec*, which is preceded |
|
237 by a colon ``':'``. |
|
238 |
|
239 The *field_name* itself begins with either a number or a keyword. If it's a |
|
240 number, it refers to a positional argument, and if it's a keyword it refers to a |
|
241 named keyword argument. This can be followed by any number of index or |
|
242 attribute expressions. An expression of the form ``'.name'`` selects the named |
|
243 attribute using :func:`getattr`, while an expression of the form ``'[index]'`` |
|
244 does an index lookup using :func:`__getitem__`. |
|
245 |
|
246 Some simple format string examples:: |
|
247 |
|
248 "First, thou shalt count to {0}" # References first positional argument |
|
249 "My quest is {name}" # References keyword argument 'name' |
|
250 "Weight in tons {0.weight}" # 'weight' attribute of first positional arg |
|
251 "Units destroyed: {players[0]}" # First element of keyword argument 'players'. |
|
252 |
|
253 The *conversion* field causes a type coercion before formatting. Normally, the |
|
254 job of formatting a value is done by the :meth:`__format__` method of the value |
|
255 itself. However, in some cases it is desirable to force a type to be formatted |
|
256 as a string, overriding its own definition of formatting. By converting the |
|
257 value to a string before calling :meth:`__format__`, the normal formatting logic |
|
258 is bypassed. |
|
259 |
|
260 Two conversion flags are currently supported: ``'!s'`` which calls :func:`str` |
|
261 on the value, and ``'!r'`` which calls :func:`repr`. |
|
262 |
|
263 Some examples:: |
|
264 |
|
265 "Harold's a clever {0!s}" # Calls str() on the argument first |
|
266 "Bring out the holy {name!r}" # Calls repr() on the argument first |
|
267 |
|
268 The *format_spec* field contains a specification of how the value should be |
|
269 presented, including such details as field width, alignment, padding, decimal |
|
270 precision and so on. Each value type can define it's own "formatting |
|
271 mini-language" or interpretation of the *format_spec*. |
|
272 |
|
273 Most built-in types support a common formatting mini-language, which is |
|
274 described in the next section. |
|
275 |
|
276 A *format_spec* field can also include nested replacement fields within it. |
|
277 These nested replacement fields can contain only a field name; conversion flags |
|
278 and format specifications are not allowed. The replacement fields within the |
|
279 format_spec are substituted before the *format_spec* string is interpreted. |
|
280 This allows the formatting of a value to be dynamically specified. |
|
281 |
|
282 For example, suppose you wanted to have a replacement field whose field width is |
|
283 determined by another variable:: |
|
284 |
|
285 "A man with two {0:{1}}".format("noses", 10) |
|
286 |
|
287 This would first evaluate the inner replacement field, making the format string |
|
288 effectively:: |
|
289 |
|
290 "A man with two {0:10}" |
|
291 |
|
292 Then the outer replacement field would be evaluated, producing:: |
|
293 |
|
294 "noses " |
|
295 |
|
296 Which is substituted into the string, yielding:: |
|
297 |
|
298 "A man with two noses " |
|
299 |
|
300 (The extra space is because we specified a field width of 10, and because left |
|
301 alignment is the default for strings.) |
|
302 |
|
303 |
|
304 .. _formatspec: |
|
305 |
|
306 Format Specification Mini-Language |
|
307 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
308 |
|
309 "Format specifications" are used within replacement fields contained within a |
|
310 format string to define how individual values are presented (see |
|
311 :ref:`formatstrings`.) They can also be passed directly to the builtin |
|
312 :func:`format` function. Each formattable type may define how the format |
|
313 specification is to be interpreted. |
|
314 |
|
315 Most built-in types implement the following options for format specifications, |
|
316 although some of the formatting options are only supported by the numeric types. |
|
317 |
|
318 A general convention is that an empty format string (``""``) produces the same |
|
319 result as if you had called :func:`str` on the value. |
|
320 |
|
321 The general form of a *standard format specifier* is: |
|
322 |
|
323 .. productionlist:: sf |
|
324 format_spec: [[`fill`]`align`][`sign`][#][0][`width`][.`precision`][`type`] |
|
325 fill: <a character other than '}'> |
|
326 align: "<" | ">" | "=" | "^" |
|
327 sign: "+" | "-" | " " |
|
328 width: `integer` |
|
329 precision: `integer` |
|
330 type: "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "x" | "X" | "%" |
|
331 |
|
332 The *fill* character can be any character other than '}' (which signifies the |
|
333 end of the field). The presence of a fill character is signaled by the *next* |
|
334 character, which must be one of the alignment options. If the second character |
|
335 of *format_spec* is not a valid alignment option, then it is assumed that both |
|
336 the fill character and the alignment option are absent. |
|
337 |
|
338 The meaning of the various alignment options is as follows: |
|
339 |
|
340 +---------+----------------------------------------------------------+ |
|
341 | Option | Meaning | |
|
342 +=========+==========================================================+ |
|
343 | ``'<'`` | Forces the field to be left-aligned within the available | |
|
344 | | space (This is the default.) | |
|
345 +---------+----------------------------------------------------------+ |
|
346 | ``'>'`` | Forces the field to be right-aligned within the | |
|
347 | | available space. | |
|
348 +---------+----------------------------------------------------------+ |
|
349 | ``'='`` | Forces the padding to be placed after the sign (if any) | |
|
350 | | but before the digits. This is used for printing fields | |
|
351 | | in the form '+000000120'. This alignment option is only | |
|
352 | | valid for numeric types. | |
|
353 +---------+----------------------------------------------------------+ |
|
354 | ``'^'`` | Forces the field to be centered within the available | |
|
355 | | space. | |
|
356 +---------+----------------------------------------------------------+ |
|
357 |
|
358 Note that unless a minimum field width is defined, the field width will always |
|
359 be the same size as the data to fill it, so that the alignment option has no |
|
360 meaning in this case. |
|
361 |
|
362 The *sign* option is only valid for number types, and can be one of the |
|
363 following: |
|
364 |
|
365 +---------+----------------------------------------------------------+ |
|
366 | Option | Meaning | |
|
367 +=========+==========================================================+ |
|
368 | ``'+'`` | indicates that a sign should be used for both | |
|
369 | | positive as well as negative numbers. | |
|
370 +---------+----------------------------------------------------------+ |
|
371 | ``'-'`` | indicates that a sign should be used only for negative | |
|
372 | | numbers (this is the default behavior). | |
|
373 +---------+----------------------------------------------------------+ |
|
374 | space | indicates that a leading space should be used on | |
|
375 | | positive numbers, and a minus sign on negative numbers. | |
|
376 +---------+----------------------------------------------------------+ |
|
377 |
|
378 The ``'#'`` option is only valid for integers, and only for binary, octal, or |
|
379 hexadecimal output. If present, it specifies that the output will be prefixed |
|
380 by ``'0b'``, ``'0o'``, or ``'0x'``, respectively. |
|
381 |
|
382 *width* is a decimal integer defining the minimum field width. If not |
|
383 specified, then the field width will be determined by the content. |
|
384 |
|
385 If the *width* field is preceded by a zero (``'0'``) character, this enables |
|
386 zero-padding. This is equivalent to an *alignment* type of ``'='`` and a *fill* |
|
387 character of ``'0'``. |
|
388 |
|
389 The *precision* is a decimal number indicating how many digits should be |
|
390 displayed after the decimal point for a floating point value formatted with |
|
391 ``'f'`` and ``'F'``, or before and after the decimal point for a floating point |
|
392 value formatted with ``'g'`` or ``'G'``. For non-number types the field |
|
393 indicates the maximum field size - in other words, how many characters will be |
|
394 used from the field content. The *precision* is ignored for integer values. |
|
395 |
|
396 Finally, the *type* determines how the data should be presented. |
|
397 |
|
398 The available integer presentation types are: |
|
399 |
|
400 +---------+----------------------------------------------------------+ |
|
401 | Type | Meaning | |
|
402 +=========+==========================================================+ |
|
403 | ``'b'`` | Binary format. Outputs the number in base 2. | |
|
404 +---------+----------------------------------------------------------+ |
|
405 | ``'c'`` | Character. Converts the integer to the corresponding | |
|
406 | | unicode character before printing. | |
|
407 +---------+----------------------------------------------------------+ |
|
408 | ``'d'`` | Decimal Integer. Outputs the number in base 10. | |
|
409 +---------+----------------------------------------------------------+ |
|
410 | ``'o'`` | Octal format. Outputs the number in base 8. | |
|
411 +---------+----------------------------------------------------------+ |
|
412 | ``'x'`` | Hex format. Outputs the number in base 16, using lower- | |
|
413 | | case letters for the digits above 9. | |
|
414 +---------+----------------------------------------------------------+ |
|
415 | ``'X'`` | Hex format. Outputs the number in base 16, using upper- | |
|
416 | | case letters for the digits above 9. | |
|
417 +---------+----------------------------------------------------------+ |
|
418 | ``'n'`` | Number. This is the same as ``'d'``, except that it uses | |
|
419 | | the current locale setting to insert the appropriate | |
|
420 | | number separator characters. | |
|
421 +---------+----------------------------------------------------------+ |
|
422 | None | The same as ``'d'``. | |
|
423 +---------+----------------------------------------------------------+ |
|
424 |
|
425 The available presentation types for floating point and decimal values are: |
|
426 |
|
427 +---------+----------------------------------------------------------+ |
|
428 | Type | Meaning | |
|
429 +=========+==========================================================+ |
|
430 | ``'e'`` | Exponent notation. Prints the number in scientific | |
|
431 | | notation using the letter 'e' to indicate the exponent. | |
|
432 +---------+----------------------------------------------------------+ |
|
433 | ``'E'`` | Exponent notation. Same as ``'e'`` except it uses an | |
|
434 | | upper case 'E' as the separator character. | |
|
435 +---------+----------------------------------------------------------+ |
|
436 | ``'f'`` | Fixed point. Displays the number as a fixed-point | |
|
437 | | number. | |
|
438 +---------+----------------------------------------------------------+ |
|
439 | ``'F'`` | Fixed point. Same as ``'f'``. | |
|
440 +---------+----------------------------------------------------------+ |
|
441 | ``'g'`` | General format. This prints the number as a fixed-point | |
|
442 | | number, unless the number is too large, in which case | |
|
443 | | it switches to ``'e'`` exponent notation. Infinity and | |
|
444 | | NaN values are formatted as ``inf``, ``-inf`` and | |
|
445 | | ``nan``, respectively. | |
|
446 +---------+----------------------------------------------------------+ |
|
447 | ``'G'`` | General format. Same as ``'g'`` except switches to | |
|
448 | | ``'E'`` if the number gets to large. The representations | |
|
449 | | of infinity and NaN are uppercased, too. | |
|
450 +---------+----------------------------------------------------------+ |
|
451 | ``'n'`` | Number. This is the same as ``'g'``, except that it uses | |
|
452 | | the current locale setting to insert the appropriate | |
|
453 | | number separator characters. | |
|
454 +---------+----------------------------------------------------------+ |
|
455 | ``'%'`` | Percentage. Multiplies the number by 100 and displays | |
|
456 | | in fixed (``'f'``) format, followed by a percent sign. | |
|
457 +---------+----------------------------------------------------------+ |
|
458 | None | The same as ``'g'``. | |
|
459 +---------+----------------------------------------------------------+ |
|
460 |
|
461 |
|
462 Template strings |
|
463 ---------------- |
|
464 |
|
465 Templates provide simpler string substitutions as described in :pep:`292`. |
|
466 Instead of the normal ``%``\ -based substitutions, Templates support ``$``\ |
|
467 -based substitutions, using the following rules: |
|
468 |
|
469 * ``$$`` is an escape; it is replaced with a single ``$``. |
|
470 |
|
471 * ``$identifier`` names a substitution placeholder matching a mapping key of |
|
472 ``"identifier"``. By default, ``"identifier"`` must spell a Python |
|
473 identifier. The first non-identifier character after the ``$`` character |
|
474 terminates this placeholder specification. |
|
475 |
|
476 * ``${identifier}`` is equivalent to ``$identifier``. It is required when valid |
|
477 identifier characters follow the placeholder but are not part of the |
|
478 placeholder, such as ``"${noun}ification"``. |
|
479 |
|
480 Any other appearance of ``$`` in the string will result in a :exc:`ValueError` |
|
481 being raised. |
|
482 |
|
483 .. versionadded:: 2.4 |
|
484 |
|
485 The :mod:`string` module provides a :class:`Template` class that implements |
|
486 these rules. The methods of :class:`Template` are: |
|
487 |
|
488 |
|
489 .. class:: Template(template) |
|
490 |
|
491 The constructor takes a single argument which is the template string. |
|
492 |
|
493 |
|
494 .. method:: substitute(mapping[, **kws]) |
|
495 |
|
496 Performs the template substitution, returning a new string. *mapping* is |
|
497 any dictionary-like object with keys that match the placeholders in the |
|
498 template. Alternatively, you can provide keyword arguments, where the |
|
499 keywords are the placeholders. When both *mapping* and *kws* are given |
|
500 and there are duplicates, the placeholders from *kws* take precedence. |
|
501 |
|
502 |
|
503 .. method:: safe_substitute(mapping[, **kws]) |
|
504 |
|
505 Like :meth:`substitute`, except that if placeholders are missing from |
|
506 *mapping* and *kws*, instead of raising a :exc:`KeyError` exception, the |
|
507 original placeholder will appear in the resulting string intact. Also, |
|
508 unlike with :meth:`substitute`, any other appearances of the ``$`` will |
|
509 simply return ``$`` instead of raising :exc:`ValueError`. |
|
510 |
|
511 While other exceptions may still occur, this method is called "safe" |
|
512 because substitutions always tries to return a usable string instead of |
|
513 raising an exception. In another sense, :meth:`safe_substitute` may be |
|
514 anything other than safe, since it will silently ignore malformed |
|
515 templates containing dangling delimiters, unmatched braces, or |
|
516 placeholders that are not valid Python identifiers. |
|
517 |
|
518 :class:`Template` instances also provide one public data attribute: |
|
519 |
|
520 |
|
521 .. attribute:: string.template |
|
522 |
|
523 This is the object passed to the constructor's *template* argument. In general, |
|
524 you shouldn't change it, but read-only access is not enforced. |
|
525 |
|
526 Here is an example of how to use a Template: |
|
527 |
|
528 >>> from string import Template |
|
529 >>> s = Template('$who likes $what') |
|
530 >>> s.substitute(who='tim', what='kung pao') |
|
531 'tim likes kung pao' |
|
532 >>> d = dict(who='tim') |
|
533 >>> Template('Give $who $100').substitute(d) |
|
534 Traceback (most recent call last): |
|
535 [...] |
|
536 ValueError: Invalid placeholder in string: line 1, col 10 |
|
537 >>> Template('$who likes $what').substitute(d) |
|
538 Traceback (most recent call last): |
|
539 [...] |
|
540 KeyError: 'what' |
|
541 >>> Template('$who likes $what').safe_substitute(d) |
|
542 'tim likes $what' |
|
543 |
|
544 Advanced usage: you can derive subclasses of :class:`Template` to customize the |
|
545 placeholder syntax, delimiter character, or the entire regular expression used |
|
546 to parse template strings. To do this, you can override these class attributes: |
|
547 |
|
548 * *delimiter* -- This is the literal string describing a placeholder introducing |
|
549 delimiter. The default value ``$``. Note that this should *not* be a regular |
|
550 expression, as the implementation will call :meth:`re.escape` on this string as |
|
551 needed. |
|
552 |
|
553 * *idpattern* -- This is the regular expression describing the pattern for |
|
554 non-braced placeholders (the braces will be added automatically as |
|
555 appropriate). The default value is the regular expression |
|
556 ``[_a-z][_a-z0-9]*``. |
|
557 |
|
558 Alternatively, you can provide the entire regular expression pattern by |
|
559 overriding the class attribute *pattern*. If you do this, the value must be a |
|
560 regular expression object with four named capturing groups. The capturing |
|
561 groups correspond to the rules given above, along with the invalid placeholder |
|
562 rule: |
|
563 |
|
564 * *escaped* -- This group matches the escape sequence, e.g. ``$$``, in the |
|
565 default pattern. |
|
566 |
|
567 * *named* -- This group matches the unbraced placeholder name; it should not |
|
568 include the delimiter in capturing group. |
|
569 |
|
570 * *braced* -- This group matches the brace enclosed placeholder name; it should |
|
571 not include either the delimiter or braces in the capturing group. |
|
572 |
|
573 * *invalid* -- This group matches any other delimiter pattern (usually a single |
|
574 delimiter), and it should appear last in the regular expression. |
|
575 |
|
576 |
|
577 String functions |
|
578 ---------------- |
|
579 |
|
580 The following functions are available to operate on string and Unicode objects. |
|
581 They are not available as string methods. |
|
582 |
|
583 |
|
584 .. function:: capwords(s) |
|
585 |
|
586 Split the argument into words using :func:`split`, capitalize each word using |
|
587 :func:`capitalize`, and join the capitalized words using :func:`join`. Note |
|
588 that this replaces runs of whitespace characters by a single space, and removes |
|
589 leading and trailing whitespace. |
|
590 |
|
591 |
|
592 .. function:: maketrans(from, to) |
|
593 |
|
594 Return a translation table suitable for passing to :func:`translate`, that will |
|
595 map each character in *from* into the character at the same position in *to*; |
|
596 *from* and *to* must have the same length. |
|
597 |
|
598 .. warning:: |
|
599 |
|
600 Don't use strings derived from :const:`lowercase` and :const:`uppercase` as |
|
601 arguments; in some locales, these don't have the same length. For case |
|
602 conversions, always use :func:`lower` and :func:`upper`. |
|
603 |
|
604 |
|
605 Deprecated string functions |
|
606 --------------------------- |
|
607 |
|
608 The following list of functions are also defined as methods of string and |
|
609 Unicode objects; see section :ref:`string-methods` for more information on |
|
610 those. You should consider these functions as deprecated, although they will |
|
611 not be removed until Python 3.0. The functions defined in this module are: |
|
612 |
|
613 |
|
614 .. function:: atof(s) |
|
615 |
|
616 .. deprecated:: 2.0 |
|
617 Use the :func:`float` built-in function. |
|
618 |
|
619 .. index:: builtin: float |
|
620 |
|
621 Convert a string to a floating point number. The string must have the standard |
|
622 syntax for a floating point literal in Python, optionally preceded by a sign |
|
623 (``+`` or ``-``). Note that this behaves identical to the built-in function |
|
624 :func:`float` when passed a string. |
|
625 |
|
626 .. note:: |
|
627 |
|
628 .. index:: |
|
629 single: NaN |
|
630 single: Infinity |
|
631 |
|
632 When passing in a string, values for NaN and Infinity may be returned, depending |
|
633 on the underlying C library. The specific set of strings accepted which cause |
|
634 these values to be returned depends entirely on the C library and is known to |
|
635 vary. |
|
636 |
|
637 |
|
638 .. function:: atoi(s[, base]) |
|
639 |
|
640 .. deprecated:: 2.0 |
|
641 Use the :func:`int` built-in function. |
|
642 |
|
643 .. index:: builtin: eval |
|
644 |
|
645 Convert string *s* to an integer in the given *base*. The string must consist |
|
646 of one or more digits, optionally preceded by a sign (``+`` or ``-``). The |
|
647 *base* defaults to 10. If it is 0, a default base is chosen depending on the |
|
648 leading characters of the string (after stripping the sign): ``0x`` or ``0X`` |
|
649 means 16, ``0`` means 8, anything else means 10. If *base* is 16, a leading |
|
650 ``0x`` or ``0X`` is always accepted, though not required. This behaves |
|
651 identically to the built-in function :func:`int` when passed a string. (Also |
|
652 note: for a more flexible interpretation of numeric literals, use the built-in |
|
653 function :func:`eval`.) |
|
654 |
|
655 |
|
656 .. function:: atol(s[, base]) |
|
657 |
|
658 .. deprecated:: 2.0 |
|
659 Use the :func:`long` built-in function. |
|
660 |
|
661 .. index:: builtin: long |
|
662 |
|
663 Convert string *s* to a long integer in the given *base*. The string must |
|
664 consist of one or more digits, optionally preceded by a sign (``+`` or ``-``). |
|
665 The *base* argument has the same meaning as for :func:`atoi`. A trailing ``l`` |
|
666 or ``L`` is not allowed, except if the base is 0. Note that when invoked |
|
667 without *base* or with *base* set to 10, this behaves identical to the built-in |
|
668 function :func:`long` when passed a string. |
|
669 |
|
670 |
|
671 .. function:: capitalize(word) |
|
672 |
|
673 Return a copy of *word* with only its first character capitalized. |
|
674 |
|
675 |
|
676 .. function:: expandtabs(s[, tabsize]) |
|
677 |
|
678 Expand tabs in a string replacing them by one or more spaces, depending on the |
|
679 current column and the given tab size. The column number is reset to zero after |
|
680 each newline occurring in the string. This doesn't understand other non-printing |
|
681 characters or escape sequences. The tab size defaults to 8. |
|
682 |
|
683 |
|
684 .. function:: find(s, sub[, start[,end]]) |
|
685 |
|
686 Return the lowest index in *s* where the substring *sub* is found such that |
|
687 *sub* is wholly contained in ``s[start:end]``. Return ``-1`` on failure. |
|
688 Defaults for *start* and *end* and interpretation of negative values is the same |
|
689 as for slices. |
|
690 |
|
691 |
|
692 .. function:: rfind(s, sub[, start[, end]]) |
|
693 |
|
694 Like :func:`find` but find the highest index. |
|
695 |
|
696 |
|
697 .. function:: index(s, sub[, start[, end]]) |
|
698 |
|
699 Like :func:`find` but raise :exc:`ValueError` when the substring is not found. |
|
700 |
|
701 |
|
702 .. function:: rindex(s, sub[, start[, end]]) |
|
703 |
|
704 Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found. |
|
705 |
|
706 |
|
707 .. function:: count(s, sub[, start[, end]]) |
|
708 |
|
709 Return the number of (non-overlapping) occurrences of substring *sub* in string |
|
710 ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative |
|
711 values are the same as for slices. |
|
712 |
|
713 |
|
714 .. function:: lower(s) |
|
715 |
|
716 Return a copy of *s*, but with upper case letters converted to lower case. |
|
717 |
|
718 |
|
719 .. function:: split(s[, sep[, maxsplit]]) |
|
720 |
|
721 Return a list of the words of the string *s*. If the optional second argument |
|
722 *sep* is absent or ``None``, the words are separated by arbitrary strings of |
|
723 whitespace characters (space, tab, newline, return, formfeed). If the second |
|
724 argument *sep* is present and not ``None``, it specifies a string to be used as |
|
725 the word separator. The returned list will then have one more item than the |
|
726 number of non-overlapping occurrences of the separator in the string. The |
|
727 optional third argument *maxsplit* defaults to 0. If it is nonzero, at most |
|
728 *maxsplit* number of splits occur, and the remainder of the string is returned |
|
729 as the final element of the list (thus, the list will have at most |
|
730 ``maxsplit+1`` elements). |
|
731 |
|
732 The behavior of split on an empty string depends on the value of *sep*. If *sep* |
|
733 is not specified, or specified as ``None``, the result will be an empty list. |
|
734 If *sep* is specified as any string, the result will be a list containing one |
|
735 element which is an empty string. |
|
736 |
|
737 |
|
738 .. function:: rsplit(s[, sep[, maxsplit]]) |
|
739 |
|
740 Return a list of the words of the string *s*, scanning *s* from the end. To all |
|
741 intents and purposes, the resulting list of words is the same as returned by |
|
742 :func:`split`, except when the optional third argument *maxsplit* is explicitly |
|
743 specified and nonzero. When *maxsplit* is nonzero, at most *maxsplit* number of |
|
744 splits -- the *rightmost* ones -- occur, and the remainder of the string is |
|
745 returned as the first element of the list (thus, the list will have at most |
|
746 ``maxsplit+1`` elements). |
|
747 |
|
748 .. versionadded:: 2.4 |
|
749 |
|
750 |
|
751 .. function:: splitfields(s[, sep[, maxsplit]]) |
|
752 |
|
753 This function behaves identically to :func:`split`. (In the past, :func:`split` |
|
754 was only used with one argument, while :func:`splitfields` was only used with |
|
755 two arguments.) |
|
756 |
|
757 |
|
758 .. function:: join(words[, sep]) |
|
759 |
|
760 Concatenate a list or tuple of words with intervening occurrences of *sep*. |
|
761 The default value for *sep* is a single space character. It is always true that |
|
762 ``string.join(string.split(s, sep), sep)`` equals *s*. |
|
763 |
|
764 |
|
765 .. function:: joinfields(words[, sep]) |
|
766 |
|
767 This function behaves identically to :func:`join`. (In the past, :func:`join` |
|
768 was only used with one argument, while :func:`joinfields` was only used with two |
|
769 arguments.) Note that there is no :meth:`joinfields` method on string objects; |
|
770 use the :meth:`join` method instead. |
|
771 |
|
772 |
|
773 .. function:: lstrip(s[, chars]) |
|
774 |
|
775 Return a copy of the string with leading characters removed. If *chars* is |
|
776 omitted or ``None``, whitespace characters are removed. If given and not |
|
777 ``None``, *chars* must be a string; the characters in the string will be |
|
778 stripped from the beginning of the string this method is called on. |
|
779 |
|
780 .. versionchanged:: 2.2.3 |
|
781 The *chars* parameter was added. The *chars* parameter cannot be passed in |
|
782 earlier 2.2 versions. |
|
783 |
|
784 |
|
785 .. function:: rstrip(s[, chars]) |
|
786 |
|
787 Return a copy of the string with trailing characters removed. If *chars* is |
|
788 omitted or ``None``, whitespace characters are removed. If given and not |
|
789 ``None``, *chars* must be a string; the characters in the string will be |
|
790 stripped from the end of the string this method is called on. |
|
791 |
|
792 .. versionchanged:: 2.2.3 |
|
793 The *chars* parameter was added. The *chars* parameter cannot be passed in |
|
794 earlier 2.2 versions. |
|
795 |
|
796 |
|
797 .. function:: strip(s[, chars]) |
|
798 |
|
799 Return a copy of the string with leading and trailing characters removed. If |
|
800 *chars* is omitted or ``None``, whitespace characters are removed. If given and |
|
801 not ``None``, *chars* must be a string; the characters in the string will be |
|
802 stripped from the both ends of the string this method is called on. |
|
803 |
|
804 .. versionchanged:: 2.2.3 |
|
805 The *chars* parameter was added. The *chars* parameter cannot be passed in |
|
806 earlier 2.2 versions. |
|
807 |
|
808 |
|
809 .. function:: swapcase(s) |
|
810 |
|
811 Return a copy of *s*, but with lower case letters converted to upper case and |
|
812 vice versa. |
|
813 |
|
814 |
|
815 .. function:: translate(s, table[, deletechars]) |
|
816 |
|
817 Delete all characters from *s* that are in *deletechars* (if present), and then |
|
818 translate the characters using *table*, which must be a 256-character string |
|
819 giving the translation for each character value, indexed by its ordinal. If |
|
820 *table* is ``None``, then only the character deletion step is performed. |
|
821 |
|
822 |
|
823 .. function:: upper(s) |
|
824 |
|
825 Return a copy of *s*, but with lower case letters converted to upper case. |
|
826 |
|
827 |
|
828 .. function:: ljust(s, width) |
|
829 rjust(s, width) |
|
830 center(s, width) |
|
831 |
|
832 These functions respectively left-justify, right-justify and center a string in |
|
833 a field of given width. They return a string that is at least *width* |
|
834 characters wide, created by padding the string *s* with spaces until the given |
|
835 width on the right, left or both sides. The string is never truncated. |
|
836 |
|
837 |
|
838 .. function:: zfill(s, width) |
|
839 |
|
840 Pad a numeric string on the left with zero digits until the given width is |
|
841 reached. Strings starting with a sign are handled correctly. |
|
842 |
|
843 |
|
844 .. function:: replace(str, old, new[, maxreplace]) |
|
845 |
|
846 Return a copy of string *str* with all occurrences of substring *old* replaced |
|
847 by *new*. If the optional argument *maxreplace* is given, the first |
|
848 *maxreplace* occurrences are replaced. |
|
849 |