|
1 ******************************** |
|
2 Functional Programming HOWTO |
|
3 ******************************** |
|
4 |
|
5 :Author: A. M. Kuchling |
|
6 :Release: 0.31 |
|
7 |
|
8 (This is a first draft. Please send comments/error reports/suggestions to |
|
9 amk@amk.ca.) |
|
10 |
|
11 In this document, we'll take a tour of Python's features suitable for |
|
12 implementing programs in a functional style. After an introduction to the |
|
13 concepts of functional programming, we'll look at language features such as |
|
14 :term:`iterator`\s and :term:`generator`\s and relevant library modules such as |
|
15 :mod:`itertools` and :mod:`functools`. |
|
16 |
|
17 |
|
18 Introduction |
|
19 ============ |
|
20 |
|
21 This section explains the basic concept of functional programming; if you're |
|
22 just interested in learning about Python language features, skip to the next |
|
23 section. |
|
24 |
|
25 Programming languages support decomposing problems in several different ways: |
|
26 |
|
27 * Most programming languages are **procedural**: programs are lists of |
|
28 instructions that tell the computer what to do with the program's input. C, |
|
29 Pascal, and even Unix shells are procedural languages. |
|
30 |
|
31 * In **declarative** languages, you write a specification that describes the |
|
32 problem to be solved, and the language implementation figures out how to |
|
33 perform the computation efficiently. SQL is the declarative language you're |
|
34 most likely to be familiar with; a SQL query describes the data set you want |
|
35 to retrieve, and the SQL engine decides whether to scan tables or use indexes, |
|
36 which subclauses should be performed first, etc. |
|
37 |
|
38 * **Object-oriented** programs manipulate collections of objects. Objects have |
|
39 internal state and support methods that query or modify this internal state in |
|
40 some way. Smalltalk and Java are object-oriented languages. C++ and Python |
|
41 are languages that support object-oriented programming, but don't force the |
|
42 use of object-oriented features. |
|
43 |
|
44 * **Functional** programming decomposes a problem into a set of functions. |
|
45 Ideally, functions only take inputs and produce outputs, and don't have any |
|
46 internal state that affects the output produced for a given input. Well-known |
|
47 functional languages include the ML family (Standard ML, OCaml, and other |
|
48 variants) and Haskell. |
|
49 |
|
50 The designers of some computer languages choose to emphasize one |
|
51 particular approach to programming. This often makes it difficult to |
|
52 write programs that use a different approach. Other languages are |
|
53 multi-paradigm languages that support several different approaches. |
|
54 Lisp, C++, and Python are multi-paradigm; you can write programs or |
|
55 libraries that are largely procedural, object-oriented, or functional |
|
56 in all of these languages. In a large program, different sections |
|
57 might be written using different approaches; the GUI might be |
|
58 object-oriented while the processing logic is procedural or |
|
59 functional, for example. |
|
60 |
|
61 In a functional program, input flows through a set of functions. Each function |
|
62 operates on its input and produces some output. Functional style discourages |
|
63 functions with side effects that modify internal state or make other changes |
|
64 that aren't visible in the function's return value. Functions that have no side |
|
65 effects at all are called **purely functional**. Avoiding side effects means |
|
66 not using data structures that get updated as a program runs; every function's |
|
67 output must only depend on its input. |
|
68 |
|
69 Some languages are very strict about purity and don't even have assignment |
|
70 statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all |
|
71 side effects. Printing to the screen or writing to a disk file are side |
|
72 effects, for example. For example, in Python a ``print`` statement or a |
|
73 ``time.sleep(1)`` both return no useful value; they're only called for their |
|
74 side effects of sending some text to the screen or pausing execution for a |
|
75 second. |
|
76 |
|
77 Python programs written in functional style usually won't go to the extreme of |
|
78 avoiding all I/O or all assignments; instead, they'll provide a |
|
79 functional-appearing interface but will use non-functional features internally. |
|
80 For example, the implementation of a function will still use assignments to |
|
81 local variables, but won't modify global variables or have other side effects. |
|
82 |
|
83 Functional programming can be considered the opposite of object-oriented |
|
84 programming. Objects are little capsules containing some internal state along |
|
85 with a collection of method calls that let you modify this state, and programs |
|
86 consist of making the right set of state changes. Functional programming wants |
|
87 to avoid state changes as much as possible and works with data flowing between |
|
88 functions. In Python you might combine the two approaches by writing functions |
|
89 that take and return instances representing objects in your application (e-mail |
|
90 messages, transactions, etc.). |
|
91 |
|
92 Functional design may seem like an odd constraint to work under. Why should you |
|
93 avoid objects and side effects? There are theoretical and practical advantages |
|
94 to the functional style: |
|
95 |
|
96 * Formal provability. |
|
97 * Modularity. |
|
98 * Composability. |
|
99 * Ease of debugging and testing. |
|
100 |
|
101 |
|
102 Formal provability |
|
103 ------------------ |
|
104 |
|
105 A theoretical benefit is that it's easier to construct a mathematical proof that |
|
106 a functional program is correct. |
|
107 |
|
108 For a long time researchers have been interested in finding ways to |
|
109 mathematically prove programs correct. This is different from testing a program |
|
110 on numerous inputs and concluding that its output is usually correct, or reading |
|
111 a program's source code and concluding that the code looks right; the goal is |
|
112 instead a rigorous proof that a program produces the right result for all |
|
113 possible inputs. |
|
114 |
|
115 The technique used to prove programs correct is to write down **invariants**, |
|
116 properties of the input data and of the program's variables that are always |
|
117 true. For each line of code, you then show that if invariants X and Y are true |
|
118 **before** the line is executed, the slightly different invariants X' and Y' are |
|
119 true **after** the line is executed. This continues until you reach the end of |
|
120 the program, at which point the invariants should match the desired conditions |
|
121 on the program's output. |
|
122 |
|
123 Functional programming's avoidance of assignments arose because assignments are |
|
124 difficult to handle with this technique; assignments can break invariants that |
|
125 were true before the assignment without producing any new invariants that can be |
|
126 propagated onward. |
|
127 |
|
128 Unfortunately, proving programs correct is largely impractical and not relevant |
|
129 to Python software. Even trivial programs require proofs that are several pages |
|
130 long; the proof of correctness for a moderately complicated program would be |
|
131 enormous, and few or none of the programs you use daily (the Python interpreter, |
|
132 your XML parser, your web browser) could be proven correct. Even if you wrote |
|
133 down or generated a proof, there would then be the question of verifying the |
|
134 proof; maybe there's an error in it, and you wrongly believe you've proved the |
|
135 program correct. |
|
136 |
|
137 |
|
138 Modularity |
|
139 ---------- |
|
140 |
|
141 A more practical benefit of functional programming is that it forces you to |
|
142 break apart your problem into small pieces. Programs are more modular as a |
|
143 result. It's easier to specify and write a small function that does one thing |
|
144 than a large function that performs a complicated transformation. Small |
|
145 functions are also easier to read and to check for errors. |
|
146 |
|
147 |
|
148 Ease of debugging and testing |
|
149 ----------------------------- |
|
150 |
|
151 Testing and debugging a functional-style program is easier. |
|
152 |
|
153 Debugging is simplified because functions are generally small and clearly |
|
154 specified. When a program doesn't work, each function is an interface point |
|
155 where you can check that the data are correct. You can look at the intermediate |
|
156 inputs and outputs to quickly isolate the function that's responsible for a bug. |
|
157 |
|
158 Testing is easier because each function is a potential subject for a unit test. |
|
159 Functions don't depend on system state that needs to be replicated before |
|
160 running a test; instead you only have to synthesize the right input and then |
|
161 check that the output matches expectations. |
|
162 |
|
163 |
|
164 Composability |
|
165 ------------- |
|
166 |
|
167 As you work on a functional-style program, you'll write a number of functions |
|
168 with varying inputs and outputs. Some of these functions will be unavoidably |
|
169 specialized to a particular application, but others will be useful in a wide |
|
170 variety of programs. For example, a function that takes a directory path and |
|
171 returns all the XML files in the directory, or a function that takes a filename |
|
172 and returns its contents, can be applied to many different situations. |
|
173 |
|
174 Over time you'll form a personal library of utilities. Often you'll assemble |
|
175 new programs by arranging existing functions in a new configuration and writing |
|
176 a few functions specialized for the current task. |
|
177 |
|
178 |
|
179 Iterators |
|
180 ========= |
|
181 |
|
182 I'll start by looking at a Python language feature that's an important |
|
183 foundation for writing functional-style programs: iterators. |
|
184 |
|
185 An iterator is an object representing a stream of data; this object returns the |
|
186 data one element at a time. A Python iterator must support a method called |
|
187 ``next()`` that takes no arguments and always returns the next element of the |
|
188 stream. If there are no more elements in the stream, ``next()`` must raise the |
|
189 ``StopIteration`` exception. Iterators don't have to be finite, though; it's |
|
190 perfectly reasonable to write an iterator that produces an infinite stream of |
|
191 data. |
|
192 |
|
193 The built-in :func:`iter` function takes an arbitrary object and tries to return |
|
194 an iterator that will return the object's contents or elements, raising |
|
195 :exc:`TypeError` if the object doesn't support iteration. Several of Python's |
|
196 built-in data types support iteration, the most common being lists and |
|
197 dictionaries. An object is called an **iterable** object if you can get an |
|
198 iterator for it. |
|
199 |
|
200 You can experiment with the iteration interface manually: |
|
201 |
|
202 >>> L = [1,2,3] |
|
203 >>> it = iter(L) |
|
204 >>> print it |
|
205 <...iterator object at ...> |
|
206 >>> it.next() |
|
207 1 |
|
208 >>> it.next() |
|
209 2 |
|
210 >>> it.next() |
|
211 3 |
|
212 >>> it.next() |
|
213 Traceback (most recent call last): |
|
214 File "<stdin>", line 1, in ? |
|
215 StopIteration |
|
216 >>> |
|
217 |
|
218 Python expects iterable objects in several different contexts, the most |
|
219 important being the ``for`` statement. In the statement ``for X in Y``, Y must |
|
220 be an iterator or some object for which ``iter()`` can create an iterator. |
|
221 These two statements are equivalent:: |
|
222 |
|
223 for i in iter(obj): |
|
224 print i |
|
225 |
|
226 for i in obj: |
|
227 print i |
|
228 |
|
229 Iterators can be materialized as lists or tuples by using the :func:`list` or |
|
230 :func:`tuple` constructor functions: |
|
231 |
|
232 >>> L = [1,2,3] |
|
233 >>> iterator = iter(L) |
|
234 >>> t = tuple(iterator) |
|
235 >>> t |
|
236 (1, 2, 3) |
|
237 |
|
238 Sequence unpacking also supports iterators: if you know an iterator will return |
|
239 N elements, you can unpack them into an N-tuple: |
|
240 |
|
241 >>> L = [1,2,3] |
|
242 >>> iterator = iter(L) |
|
243 >>> a,b,c = iterator |
|
244 >>> a,b,c |
|
245 (1, 2, 3) |
|
246 |
|
247 Built-in functions such as :func:`max` and :func:`min` can take a single |
|
248 iterator argument and will return the largest or smallest element. The ``"in"`` |
|
249 and ``"not in"`` operators also support iterators: ``X in iterator`` is true if |
|
250 X is found in the stream returned by the iterator. You'll run into obvious |
|
251 problems if the iterator is infinite; ``max()``, ``min()``, and ``"not in"`` |
|
252 will never return, and if the element X never appears in the stream, the |
|
253 ``"in"`` operator won't return either. |
|
254 |
|
255 Note that you can only go forward in an iterator; there's no way to get the |
|
256 previous element, reset the iterator, or make a copy of it. Iterator objects |
|
257 can optionally provide these additional capabilities, but the iterator protocol |
|
258 only specifies the ``next()`` method. Functions may therefore consume all of |
|
259 the iterator's output, and if you need to do something different with the same |
|
260 stream, you'll have to create a new iterator. |
|
261 |
|
262 |
|
263 |
|
264 Data Types That Support Iterators |
|
265 --------------------------------- |
|
266 |
|
267 We've already seen how lists and tuples support iterators. In fact, any Python |
|
268 sequence type, such as strings, will automatically support creation of an |
|
269 iterator. |
|
270 |
|
271 Calling :func:`iter` on a dictionary returns an iterator that will loop over the |
|
272 dictionary's keys: |
|
273 |
|
274 .. not a doctest since dict ordering varies across Pythons |
|
275 |
|
276 :: |
|
277 |
|
278 >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, |
|
279 ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} |
|
280 >>> for key in m: |
|
281 ... print key, m[key] |
|
282 Mar 3 |
|
283 Feb 2 |
|
284 Aug 8 |
|
285 Sep 9 |
|
286 Apr 4 |
|
287 Jun 6 |
|
288 Jul 7 |
|
289 Jan 1 |
|
290 May 5 |
|
291 Nov 11 |
|
292 Dec 12 |
|
293 Oct 10 |
|
294 |
|
295 Note that the order is essentially random, because it's based on the hash |
|
296 ordering of the objects in the dictionary. |
|
297 |
|
298 Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries |
|
299 have methods that return other iterators. If you want to iterate over keys, |
|
300 values, or key/value pairs, you can explicitly call the ``iterkeys()``, |
|
301 ``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator. |
|
302 |
|
303 The :func:`dict` constructor can accept an iterator that returns a finite stream |
|
304 of ``(key, value)`` tuples: |
|
305 |
|
306 >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')] |
|
307 >>> dict(iter(L)) |
|
308 {'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'} |
|
309 |
|
310 Files also support iteration by calling the ``readline()`` method until there |
|
311 are no more lines in the file. This means you can read each line of a file like |
|
312 this:: |
|
313 |
|
314 for line in file: |
|
315 # do something for each line |
|
316 ... |
|
317 |
|
318 Sets can take their contents from an iterable and let you iterate over the set's |
|
319 elements:: |
|
320 |
|
321 S = set((2, 3, 5, 7, 11, 13)) |
|
322 for i in S: |
|
323 print i |
|
324 |
|
325 |
|
326 |
|
327 Generator expressions and list comprehensions |
|
328 ============================================= |
|
329 |
|
330 Two common operations on an iterator's output are 1) performing some operation |
|
331 for every element, 2) selecting a subset of elements that meet some condition. |
|
332 For example, given a list of strings, you might want to strip off trailing |
|
333 whitespace from each line or extract all the strings containing a given |
|
334 substring. |
|
335 |
|
336 List comprehensions and generator expressions (short form: "listcomps" and |
|
337 "genexps") are a concise notation for such operations, borrowed from the |
|
338 functional programming language Haskell (http://www.haskell.org). You can strip |
|
339 all the whitespace from a stream of strings with the following code:: |
|
340 |
|
341 line_list = [' line 1\n', 'line 2 \n', ...] |
|
342 |
|
343 # Generator expression -- returns iterator |
|
344 stripped_iter = (line.strip() for line in line_list) |
|
345 |
|
346 # List comprehension -- returns list |
|
347 stripped_list = [line.strip() for line in line_list] |
|
348 |
|
349 You can select only certain elements by adding an ``"if"`` condition:: |
|
350 |
|
351 stripped_list = [line.strip() for line in line_list |
|
352 if line != ""] |
|
353 |
|
354 With a list comprehension, you get back a Python list; ``stripped_list`` is a |
|
355 list containing the resulting lines, not an iterator. Generator expressions |
|
356 return an iterator that computes the values as necessary, not needing to |
|
357 materialize all the values at once. This means that list comprehensions aren't |
|
358 useful if you're working with iterators that return an infinite stream or a very |
|
359 large amount of data. Generator expressions are preferable in these situations. |
|
360 |
|
361 Generator expressions are surrounded by parentheses ("()") and list |
|
362 comprehensions are surrounded by square brackets ("[]"). Generator expressions |
|
363 have the form:: |
|
364 |
|
365 ( expression for expr in sequence1 |
|
366 if condition1 |
|
367 for expr2 in sequence2 |
|
368 if condition2 |
|
369 for expr3 in sequence3 ... |
|
370 if condition3 |
|
371 for exprN in sequenceN |
|
372 if conditionN ) |
|
373 |
|
374 Again, for a list comprehension only the outside brackets are different (square |
|
375 brackets instead of parentheses). |
|
376 |
|
377 The elements of the generated output will be the successive values of |
|
378 ``expression``. The ``if`` clauses are all optional; if present, ``expression`` |
|
379 is only evaluated and added to the result when ``condition`` is true. |
|
380 |
|
381 Generator expressions always have to be written inside parentheses, but the |
|
382 parentheses signalling a function call also count. If you want to create an |
|
383 iterator that will be immediately passed to a function you can write:: |
|
384 |
|
385 obj_total = sum(obj.count for obj in list_all_objects()) |
|
386 |
|
387 The ``for...in`` clauses contain the sequences to be iterated over. The |
|
388 sequences do not have to be the same length, because they are iterated over from |
|
389 left to right, **not** in parallel. For each element in ``sequence1``, |
|
390 ``sequence2`` is looped over from the beginning. ``sequence3`` is then looped |
|
391 over for each resulting pair of elements from ``sequence1`` and ``sequence2``. |
|
392 |
|
393 To put it another way, a list comprehension or generator expression is |
|
394 equivalent to the following Python code:: |
|
395 |
|
396 for expr1 in sequence1: |
|
397 if not (condition1): |
|
398 continue # Skip this element |
|
399 for expr2 in sequence2: |
|
400 if not (condition2): |
|
401 continue # Skip this element |
|
402 ... |
|
403 for exprN in sequenceN: |
|
404 if not (conditionN): |
|
405 continue # Skip this element |
|
406 |
|
407 # Output the value of |
|
408 # the expression. |
|
409 |
|
410 This means that when there are multiple ``for...in`` clauses but no ``if`` |
|
411 clauses, the length of the resulting output will be equal to the product of the |
|
412 lengths of all the sequences. If you have two lists of length 3, the output |
|
413 list is 9 elements long: |
|
414 |
|
415 .. doctest:: |
|
416 :options: +NORMALIZE_WHITESPACE |
|
417 |
|
418 >>> seq1 = 'abc' |
|
419 >>> seq2 = (1,2,3) |
|
420 >>> [(x,y) for x in seq1 for y in seq2] |
|
421 [('a', 1), ('a', 2), ('a', 3), |
|
422 ('b', 1), ('b', 2), ('b', 3), |
|
423 ('c', 1), ('c', 2), ('c', 3)] |
|
424 |
|
425 To avoid introducing an ambiguity into Python's grammar, if ``expression`` is |
|
426 creating a tuple, it must be surrounded with parentheses. The first list |
|
427 comprehension below is a syntax error, while the second one is correct:: |
|
428 |
|
429 # Syntax error |
|
430 [ x,y for x in seq1 for y in seq2] |
|
431 # Correct |
|
432 [ (x,y) for x in seq1 for y in seq2] |
|
433 |
|
434 |
|
435 Generators |
|
436 ========== |
|
437 |
|
438 Generators are a special class of functions that simplify the task of writing |
|
439 iterators. Regular functions compute a value and return it, but generators |
|
440 return an iterator that returns a stream of values. |
|
441 |
|
442 You're doubtless familiar with how regular function calls work in Python or C. |
|
443 When you call a function, it gets a private namespace where its local variables |
|
444 are created. When the function reaches a ``return`` statement, the local |
|
445 variables are destroyed and the value is returned to the caller. A later call |
|
446 to the same function creates a new private namespace and a fresh set of local |
|
447 variables. But, what if the local variables weren't thrown away on exiting a |
|
448 function? What if you could later resume the function where it left off? This |
|
449 is what generators provide; they can be thought of as resumable functions. |
|
450 |
|
451 Here's the simplest example of a generator function: |
|
452 |
|
453 .. testcode:: |
|
454 |
|
455 def generate_ints(N): |
|
456 for i in range(N): |
|
457 yield i |
|
458 |
|
459 Any function containing a ``yield`` keyword is a generator function; this is |
|
460 detected by Python's :term:`bytecode` compiler which compiles the function |
|
461 specially as a result. |
|
462 |
|
463 When you call a generator function, it doesn't return a single value; instead it |
|
464 returns a generator object that supports the iterator protocol. On executing |
|
465 the ``yield`` expression, the generator outputs the value of ``i``, similar to a |
|
466 ``return`` statement. The big difference between ``yield`` and a ``return`` |
|
467 statement is that on reaching a ``yield`` the generator's state of execution is |
|
468 suspended and local variables are preserved. On the next call to the |
|
469 generator's ``.next()`` method, the function will resume executing. |
|
470 |
|
471 Here's a sample usage of the ``generate_ints()`` generator: |
|
472 |
|
473 >>> gen = generate_ints(3) |
|
474 >>> gen |
|
475 <generator object at ...> |
|
476 >>> gen.next() |
|
477 0 |
|
478 >>> gen.next() |
|
479 1 |
|
480 >>> gen.next() |
|
481 2 |
|
482 >>> gen.next() |
|
483 Traceback (most recent call last): |
|
484 File "stdin", line 1, in ? |
|
485 File "stdin", line 2, in generate_ints |
|
486 StopIteration |
|
487 |
|
488 You could equally write ``for i in generate_ints(5)``, or ``a,b,c = |
|
489 generate_ints(3)``. |
|
490 |
|
491 Inside a generator function, the ``return`` statement can only be used without a |
|
492 value, and signals the end of the procession of values; after executing a |
|
493 ``return`` the generator cannot return any further values. ``return`` with a |
|
494 value, such as ``return 5``, is a syntax error inside a generator function. The |
|
495 end of the generator's results can also be indicated by raising |
|
496 ``StopIteration`` manually, or by just letting the flow of execution fall off |
|
497 the bottom of the function. |
|
498 |
|
499 You could achieve the effect of generators manually by writing your own class |
|
500 and storing all the local variables of the generator as instance variables. For |
|
501 example, returning a list of integers could be done by setting ``self.count`` to |
|
502 0, and having the ``next()`` method increment ``self.count`` and return it. |
|
503 However, for a moderately complicated generator, writing a corresponding class |
|
504 can be much messier. |
|
505 |
|
506 The test suite included with Python's library, ``test_generators.py``, contains |
|
507 a number of more interesting examples. Here's one generator that implements an |
|
508 in-order traversal of a tree using generators recursively. :: |
|
509 |
|
510 # A recursive generator that generates Tree leaves in in-order. |
|
511 def inorder(t): |
|
512 if t: |
|
513 for x in inorder(t.left): |
|
514 yield x |
|
515 |
|
516 yield t.label |
|
517 |
|
518 for x in inorder(t.right): |
|
519 yield x |
|
520 |
|
521 Two other examples in ``test_generators.py`` produce solutions for the N-Queens |
|
522 problem (placing N queens on an NxN chess board so that no queen threatens |
|
523 another) and the Knight's Tour (finding a route that takes a knight to every |
|
524 square of an NxN chessboard without visiting any square twice). |
|
525 |
|
526 |
|
527 |
|
528 Passing values into a generator |
|
529 ------------------------------- |
|
530 |
|
531 In Python 2.4 and earlier, generators only produced output. Once a generator's |
|
532 code was invoked to create an iterator, there was no way to pass any new |
|
533 information into the function when its execution is resumed. You could hack |
|
534 together this ability by making the generator look at a global variable or by |
|
535 passing in some mutable object that callers then modify, but these approaches |
|
536 are messy. |
|
537 |
|
538 In Python 2.5 there's a simple way to pass values into a generator. |
|
539 :keyword:`yield` became an expression, returning a value that can be assigned to |
|
540 a variable or otherwise operated on:: |
|
541 |
|
542 val = (yield i) |
|
543 |
|
544 I recommend that you **always** put parentheses around a ``yield`` expression |
|
545 when you're doing something with the returned value, as in the above example. |
|
546 The parentheses aren't always necessary, but it's easier to always add them |
|
547 instead of having to remember when they're needed. |
|
548 |
|
549 (PEP 342 explains the exact rules, which are that a ``yield``-expression must |
|
550 always be parenthesized except when it occurs at the top-level expression on the |
|
551 right-hand side of an assignment. This means you can write ``val = yield i`` |
|
552 but have to use parentheses when there's an operation, as in ``val = (yield i) |
|
553 + 12``.) |
|
554 |
|
555 Values are sent into a generator by calling its ``send(value)`` method. This |
|
556 method resumes the generator's code and the ``yield`` expression returns the |
|
557 specified value. If the regular ``next()`` method is called, the ``yield`` |
|
558 returns ``None``. |
|
559 |
|
560 Here's a simple counter that increments by 1 and allows changing the value of |
|
561 the internal counter. |
|
562 |
|
563 .. testcode:: |
|
564 |
|
565 def counter (maximum): |
|
566 i = 0 |
|
567 while i < maximum: |
|
568 val = (yield i) |
|
569 # If value provided, change counter |
|
570 if val is not None: |
|
571 i = val |
|
572 else: |
|
573 i += 1 |
|
574 |
|
575 And here's an example of changing the counter: |
|
576 |
|
577 >>> it = counter(10) |
|
578 >>> print it.next() |
|
579 0 |
|
580 >>> print it.next() |
|
581 1 |
|
582 >>> print it.send(8) |
|
583 8 |
|
584 >>> print it.next() |
|
585 9 |
|
586 >>> print it.next() |
|
587 Traceback (most recent call last): |
|
588 File ``t.py'', line 15, in ? |
|
589 print it.next() |
|
590 StopIteration |
|
591 |
|
592 Because ``yield`` will often be returning ``None``, you should always check for |
|
593 this case. Don't just use its value in expressions unless you're sure that the |
|
594 ``send()`` method will be the only method used resume your generator function. |
|
595 |
|
596 In addition to ``send()``, there are two other new methods on generators: |
|
597 |
|
598 * ``throw(type, value=None, traceback=None)`` is used to raise an exception |
|
599 inside the generator; the exception is raised by the ``yield`` expression |
|
600 where the generator's execution is paused. |
|
601 |
|
602 * ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to |
|
603 terminate the iteration. On receiving this exception, the generator's code |
|
604 must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the |
|
605 exception and doing anything else is illegal and will trigger a |
|
606 :exc:`RuntimeError`. ``close()`` will also be called by Python's garbage |
|
607 collector when the generator is garbage-collected. |
|
608 |
|
609 If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest |
|
610 using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`. |
|
611 |
|
612 The cumulative effect of these changes is to turn generators from one-way |
|
613 producers of information into both producers and consumers. |
|
614 |
|
615 Generators also become **coroutines**, a more generalized form of subroutines. |
|
616 Subroutines are entered at one point and exited at another point (the top of the |
|
617 function, and a ``return`` statement), but coroutines can be entered, exited, |
|
618 and resumed at many different points (the ``yield`` statements). |
|
619 |
|
620 |
|
621 Built-in functions |
|
622 ================== |
|
623 |
|
624 Let's look in more detail at built-in functions often used with iterators. |
|
625 |
|
626 Two of Python's built-in functions, :func:`map` and :func:`filter`, are somewhat |
|
627 obsolete; they duplicate the features of list comprehensions but return actual |
|
628 lists instead of iterators. |
|
629 |
|
630 ``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]), |
|
631 f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``. |
|
632 |
|
633 >>> def upper(s): |
|
634 ... return s.upper() |
|
635 |
|
636 >>> map(upper, ['sentence', 'fragment']) |
|
637 ['SENTENCE', 'FRAGMENT'] |
|
638 |
|
639 >>> [upper(s) for s in ['sentence', 'fragment']] |
|
640 ['SENTENCE', 'FRAGMENT'] |
|
641 |
|
642 As shown above, you can achieve the same effect with a list comprehension. The |
|
643 :func:`itertools.imap` function does the same thing but can handle infinite |
|
644 iterators; it'll be discussed later, in the section on the :mod:`itertools` module. |
|
645 |
|
646 ``filter(predicate, iter)`` returns a list that contains all the sequence |
|
647 elements that meet a certain condition, and is similarly duplicated by list |
|
648 comprehensions. A **predicate** is a function that returns the truth value of |
|
649 some condition; for use with :func:`filter`, the predicate must take a single |
|
650 value. |
|
651 |
|
652 >>> def is_even(x): |
|
653 ... return (x % 2) == 0 |
|
654 |
|
655 >>> filter(is_even, range(10)) |
|
656 [0, 2, 4, 6, 8] |
|
657 |
|
658 This can also be written as a list comprehension: |
|
659 |
|
660 >>> [x for x in range(10) if is_even(x)] |
|
661 [0, 2, 4, 6, 8] |
|
662 |
|
663 :func:`filter` also has a counterpart in the :mod:`itertools` module, |
|
664 :func:`itertools.ifilter`, that returns an iterator and can therefore handle |
|
665 infinite sequences just as :func:`itertools.imap` can. |
|
666 |
|
667 ``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the |
|
668 :mod:`itertools` module because it cumulatively performs an operation on all the |
|
669 iterable's elements and therefore can't be applied to infinite iterables. |
|
670 ``func`` must be a function that takes two elements and returns a single value. |
|
671 :func:`reduce` takes the first two elements A and B returned by the iterator and |
|
672 calculates ``func(A, B)``. It then requests the third element, C, calculates |
|
673 ``func(func(A, B), C)``, combines this result with the fourth element returned, |
|
674 and continues until the iterable is exhausted. If the iterable returns no |
|
675 values at all, a :exc:`TypeError` exception is raised. If the initial value is |
|
676 supplied, it's used as a starting point and ``func(initial_value, A)`` is the |
|
677 first calculation. |
|
678 |
|
679 >>> import operator |
|
680 >>> reduce(operator.concat, ['A', 'BB', 'C']) |
|
681 'ABBC' |
|
682 >>> reduce(operator.concat, []) |
|
683 Traceback (most recent call last): |
|
684 ... |
|
685 TypeError: reduce() of empty sequence with no initial value |
|
686 >>> reduce(operator.mul, [1,2,3], 1) |
|
687 6 |
|
688 >>> reduce(operator.mul, [], 1) |
|
689 1 |
|
690 |
|
691 If you use :func:`operator.add` with :func:`reduce`, you'll add up all the |
|
692 elements of the iterable. This case is so common that there's a special |
|
693 built-in called :func:`sum` to compute it: |
|
694 |
|
695 >>> reduce(operator.add, [1,2,3,4], 0) |
|
696 10 |
|
697 >>> sum([1,2,3,4]) |
|
698 10 |
|
699 >>> sum([]) |
|
700 0 |
|
701 |
|
702 For many uses of :func:`reduce`, though, it can be clearer to just write the |
|
703 obvious :keyword:`for` loop:: |
|
704 |
|
705 # Instead of: |
|
706 product = reduce(operator.mul, [1,2,3], 1) |
|
707 |
|
708 # You can write: |
|
709 product = 1 |
|
710 for i in [1,2,3]: |
|
711 product *= i |
|
712 |
|
713 |
|
714 ``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples |
|
715 containing the count and each element. |
|
716 |
|
717 >>> for item in enumerate(['subject', 'verb', 'object']): |
|
718 ... print item |
|
719 (0, 'subject') |
|
720 (1, 'verb') |
|
721 (2, 'object') |
|
722 |
|
723 :func:`enumerate` is often used when looping through a list and recording the |
|
724 indexes at which certain conditions are met:: |
|
725 |
|
726 f = open('data.txt', 'r') |
|
727 for i, line in enumerate(f): |
|
728 if line.strip() == '': |
|
729 print 'Blank line at line #%i' % i |
|
730 |
|
731 ``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the |
|
732 elements of the iterable into a list, sorts the list, and returns the sorted |
|
733 result. The ``cmp``, ``key``, and ``reverse`` arguments are passed through to |
|
734 the constructed list's ``.sort()`` method. :: |
|
735 |
|
736 >>> import random |
|
737 >>> # Generate 8 random numbers between [0, 10000) |
|
738 >>> rand_list = random.sample(range(10000), 8) |
|
739 >>> rand_list |
|
740 [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207] |
|
741 >>> sorted(rand_list) |
|
742 [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878] |
|
743 >>> sorted(rand_list, reverse=True) |
|
744 [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769] |
|
745 |
|
746 (For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the |
|
747 Python wiki at http://wiki.python.org/moin/HowTo/Sorting.) |
|
748 |
|
749 The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an |
|
750 iterable's contents. :func:`any` returns True if any element in the iterable is |
|
751 a true value, and :func:`all` returns True if all of the elements are true |
|
752 values: |
|
753 |
|
754 >>> any([0,1,0]) |
|
755 True |
|
756 >>> any([0,0,0]) |
|
757 False |
|
758 >>> any([1,1,1]) |
|
759 True |
|
760 >>> all([0,1,0]) |
|
761 False |
|
762 >>> all([0,0,0]) |
|
763 False |
|
764 >>> all([1,1,1]) |
|
765 True |
|
766 |
|
767 |
|
768 Small functions and the lambda expression |
|
769 ========================================= |
|
770 |
|
771 When writing functional-style programs, you'll often need little functions that |
|
772 act as predicates or that combine elements in some way. |
|
773 |
|
774 If there's a Python built-in or a module function that's suitable, you don't |
|
775 need to define a new function at all:: |
|
776 |
|
777 stripped_lines = [line.strip() for line in lines] |
|
778 existing_files = filter(os.path.exists, file_list) |
|
779 |
|
780 If the function you need doesn't exist, you need to write it. One way to write |
|
781 small functions is to use the ``lambda`` statement. ``lambda`` takes a number |
|
782 of parameters and an expression combining these parameters, and creates a small |
|
783 function that returns the value of the expression:: |
|
784 |
|
785 lowercase = lambda x: x.lower() |
|
786 |
|
787 print_assign = lambda name, value: name + '=' + str(value) |
|
788 |
|
789 adder = lambda x, y: x+y |
|
790 |
|
791 An alternative is to just use the ``def`` statement and define a function in the |
|
792 usual way:: |
|
793 |
|
794 def lowercase(x): |
|
795 return x.lower() |
|
796 |
|
797 def print_assign(name, value): |
|
798 return name + '=' + str(value) |
|
799 |
|
800 def adder(x,y): |
|
801 return x + y |
|
802 |
|
803 Which alternative is preferable? That's a style question; my usual course is to |
|
804 avoid using ``lambda``. |
|
805 |
|
806 One reason for my preference is that ``lambda`` is quite limited in the |
|
807 functions it can define. The result has to be computable as a single |
|
808 expression, which means you can't have multiway ``if... elif... else`` |
|
809 comparisons or ``try... except`` statements. If you try to do too much in a |
|
810 ``lambda`` statement, you'll end up with an overly complicated expression that's |
|
811 hard to read. Quick, what's the following code doing? |
|
812 |
|
813 :: |
|
814 |
|
815 total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1] |
|
816 |
|
817 You can figure it out, but it takes time to disentangle the expression to figure |
|
818 out what's going on. Using a short nested ``def`` statements makes things a |
|
819 little bit better:: |
|
820 |
|
821 def combine (a, b): |
|
822 return 0, a[1] + b[1] |
|
823 |
|
824 total = reduce(combine, items)[1] |
|
825 |
|
826 But it would be best of all if I had simply used a ``for`` loop:: |
|
827 |
|
828 total = 0 |
|
829 for a, b in items: |
|
830 total += b |
|
831 |
|
832 Or the :func:`sum` built-in and a generator expression:: |
|
833 |
|
834 total = sum(b for a,b in items) |
|
835 |
|
836 Many uses of :func:`reduce` are clearer when written as ``for`` loops. |
|
837 |
|
838 Fredrik Lundh once suggested the following set of rules for refactoring uses of |
|
839 ``lambda``: |
|
840 |
|
841 1) Write a lambda function. |
|
842 2) Write a comment explaining what the heck that lambda does. |
|
843 3) Study the comment for a while, and think of a name that captures the essence |
|
844 of the comment. |
|
845 4) Convert the lambda to a def statement, using that name. |
|
846 5) Remove the comment. |
|
847 |
|
848 I really like these rules, but you're free to disagree |
|
849 about whether this lambda-free style is better. |
|
850 |
|
851 |
|
852 The itertools module |
|
853 ==================== |
|
854 |
|
855 The :mod:`itertools` module contains a number of commonly-used iterators as well |
|
856 as functions for combining several iterators. This section will introduce the |
|
857 module's contents by showing small examples. |
|
858 |
|
859 The module's functions fall into a few broad classes: |
|
860 |
|
861 * Functions that create a new iterator based on an existing iterator. |
|
862 * Functions for treating an iterator's elements as function arguments. |
|
863 * Functions for selecting portions of an iterator's output. |
|
864 * A function for grouping an iterator's output. |
|
865 |
|
866 Creating new iterators |
|
867 ---------------------- |
|
868 |
|
869 ``itertools.count(n)`` returns an infinite stream of integers, increasing by 1 |
|
870 each time. You can optionally supply the starting number, which defaults to 0:: |
|
871 |
|
872 itertools.count() => |
|
873 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... |
|
874 itertools.count(10) => |
|
875 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... |
|
876 |
|
877 ``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable |
|
878 and returns a new iterator that returns its elements from first to last. The |
|
879 new iterator will repeat these elements infinitely. :: |
|
880 |
|
881 itertools.cycle([1,2,3,4,5]) => |
|
882 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ... |
|
883 |
|
884 ``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or |
|
885 returns the element endlessly if ``n`` is not provided. :: |
|
886 |
|
887 itertools.repeat('abc') => |
|
888 abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ... |
|
889 itertools.repeat('abc', 5) => |
|
890 abc, abc, abc, abc, abc |
|
891 |
|
892 ``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as |
|
893 input, and returns all the elements of the first iterator, then all the elements |
|
894 of the second, and so on, until all of the iterables have been exhausted. :: |
|
895 |
|
896 itertools.chain(['a', 'b', 'c'], (1, 2, 3)) => |
|
897 a, b, c, 1, 2, 3 |
|
898 |
|
899 ``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and |
|
900 returns them in a tuple:: |
|
901 |
|
902 itertools.izip(['a', 'b', 'c'], (1, 2, 3)) => |
|
903 ('a', 1), ('b', 2), ('c', 3) |
|
904 |
|
905 It's similar to the built-in :func:`zip` function, but doesn't construct an |
|
906 in-memory list and exhaust all the input iterators before returning; instead |
|
907 tuples are constructed and returned only if they're requested. (The technical |
|
908 term for this behaviour is `lazy evaluation |
|
909 <http://en.wikipedia.org/wiki/Lazy_evaluation>`__.) |
|
910 |
|
911 This iterator is intended to be used with iterables that are all of the same |
|
912 length. If the iterables are of different lengths, the resulting stream will be |
|
913 the same length as the shortest iterable. :: |
|
914 |
|
915 itertools.izip(['a', 'b'], (1, 2, 3)) => |
|
916 ('a', 1), ('b', 2) |
|
917 |
|
918 You should avoid doing this, though, because an element may be taken from the |
|
919 longer iterators and discarded. This means you can't go on to use the iterators |
|
920 further because you risk skipping a discarded element. |
|
921 |
|
922 ``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a |
|
923 slice of the iterator. With a single ``stop`` argument, it will return the |
|
924 first ``stop`` elements. If you supply a starting index, you'll get |
|
925 ``stop-start`` elements, and if you supply a value for ``step``, elements will |
|
926 be skipped accordingly. Unlike Python's string and list slicing, you can't use |
|
927 negative values for ``start``, ``stop``, or ``step``. :: |
|
928 |
|
929 itertools.islice(range(10), 8) => |
|
930 0, 1, 2, 3, 4, 5, 6, 7 |
|
931 itertools.islice(range(10), 2, 8) => |
|
932 2, 3, 4, 5, 6, 7 |
|
933 itertools.islice(range(10), 2, 8, 2) => |
|
934 2, 4, 6 |
|
935 |
|
936 ``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n`` |
|
937 independent iterators that will all return the contents of the source iterator. |
|
938 If you don't supply a value for ``n``, the default is 2. Replicating iterators |
|
939 requires saving some of the contents of the source iterator, so this can consume |
|
940 significant memory if the iterator is large and one of the new iterators is |
|
941 consumed more than the others. :: |
|
942 |
|
943 itertools.tee( itertools.count() ) => |
|
944 iterA, iterB |
|
945 |
|
946 where iterA -> |
|
947 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... |
|
948 |
|
949 and iterB -> |
|
950 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... |
|
951 |
|
952 |
|
953 Calling functions on elements |
|
954 ----------------------------- |
|
955 |
|
956 Two functions are used for calling other functions on the contents of an |
|
957 iterable. |
|
958 |
|
959 ``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing |
|
960 ``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``:: |
|
961 |
|
962 itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) => |
|
963 6, 8, 8 |
|
964 |
|
965 The ``operator`` module contains a set of functions corresponding to Python's |
|
966 operators. Some examples are ``operator.add(a, b)`` (adds two values), |
|
967 ``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')`` |
|
968 (returns a callable that fetches the ``"id"`` attribute). |
|
969 |
|
970 ``itertools.starmap(func, iter)`` assumes that the iterable will return a stream |
|
971 of tuples, and calls ``f()`` using these tuples as the arguments:: |
|
972 |
|
973 itertools.starmap(os.path.join, |
|
974 [('/usr', 'bin', 'java'), ('/bin', 'python'), |
|
975 ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')]) |
|
976 => |
|
977 /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby |
|
978 |
|
979 |
|
980 Selecting elements |
|
981 ------------------ |
|
982 |
|
983 Another group of functions chooses a subset of an iterator's elements based on a |
|
984 predicate. |
|
985 |
|
986 ``itertools.ifilter(predicate, iter)`` returns all the elements for which the |
|
987 predicate returns true:: |
|
988 |
|
989 def is_even(x): |
|
990 return (x % 2) == 0 |
|
991 |
|
992 itertools.ifilter(is_even, itertools.count()) => |
|
993 0, 2, 4, 6, 8, 10, 12, 14, ... |
|
994 |
|
995 ``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all |
|
996 elements for which the predicate returns false:: |
|
997 |
|
998 itertools.ifilterfalse(is_even, itertools.count()) => |
|
999 1, 3, 5, 7, 9, 11, 13, 15, ... |
|
1000 |
|
1001 ``itertools.takewhile(predicate, iter)`` returns elements for as long as the |
|
1002 predicate returns true. Once the predicate returns false, the iterator will |
|
1003 signal the end of its results. |
|
1004 |
|
1005 :: |
|
1006 |
|
1007 def less_than_10(x): |
|
1008 return (x < 10) |
|
1009 |
|
1010 itertools.takewhile(less_than_10, itertools.count()) => |
|
1011 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
|
1012 |
|
1013 itertools.takewhile(is_even, itertools.count()) => |
|
1014 0 |
|
1015 |
|
1016 ``itertools.dropwhile(predicate, iter)`` discards elements while the predicate |
|
1017 returns true, and then returns the rest of the iterable's results. |
|
1018 |
|
1019 :: |
|
1020 |
|
1021 itertools.dropwhile(less_than_10, itertools.count()) => |
|
1022 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... |
|
1023 |
|
1024 itertools.dropwhile(is_even, itertools.count()) => |
|
1025 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... |
|
1026 |
|
1027 |
|
1028 Grouping elements |
|
1029 ----------------- |
|
1030 |
|
1031 The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is |
|
1032 the most complicated. ``key_func(elem)`` is a function that can compute a key |
|
1033 value for each element returned by the iterable. If you don't supply a key |
|
1034 function, the key is simply each element itself. |
|
1035 |
|
1036 ``groupby()`` collects all the consecutive elements from the underlying iterable |
|
1037 that have the same key value, and returns a stream of 2-tuples containing a key |
|
1038 value and an iterator for the elements with that key. |
|
1039 |
|
1040 :: |
|
1041 |
|
1042 city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'), |
|
1043 ('Anchorage', 'AK'), ('Nome', 'AK'), |
|
1044 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'), |
|
1045 ... |
|
1046 ] |
|
1047 |
|
1048 def get_state ((city, state)): |
|
1049 return state |
|
1050 |
|
1051 itertools.groupby(city_list, get_state) => |
|
1052 ('AL', iterator-1), |
|
1053 ('AK', iterator-2), |
|
1054 ('AZ', iterator-3), ... |
|
1055 |
|
1056 where |
|
1057 iterator-1 => |
|
1058 ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL') |
|
1059 iterator-2 => |
|
1060 ('Anchorage', 'AK'), ('Nome', 'AK') |
|
1061 iterator-3 => |
|
1062 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ') |
|
1063 |
|
1064 ``groupby()`` assumes that the underlying iterable's contents will already be |
|
1065 sorted based on the key. Note that the returned iterators also use the |
|
1066 underlying iterable, so you have to consume the results of iterator-1 before |
|
1067 requesting iterator-2 and its corresponding key. |
|
1068 |
|
1069 |
|
1070 The functools module |
|
1071 ==================== |
|
1072 |
|
1073 The :mod:`functools` module in Python 2.5 contains some higher-order functions. |
|
1074 A **higher-order function** takes one or more functions as input and returns a |
|
1075 new function. The most useful tool in this module is the |
|
1076 :func:`functools.partial` function. |
|
1077 |
|
1078 For programs written in a functional style, you'll sometimes want to construct |
|
1079 variants of existing functions that have some of the parameters filled in. |
|
1080 Consider a Python function ``f(a, b, c)``; you may wish to create a new function |
|
1081 ``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for |
|
1082 one of ``f()``'s parameters. This is called "partial function application". |
|
1083 |
|
1084 The constructor for ``partial`` takes the arguments ``(function, arg1, arg2, |
|
1085 ... kwarg1=value1, kwarg2=value2)``. The resulting object is callable, so you |
|
1086 can just call it to invoke ``function`` with the filled-in arguments. |
|
1087 |
|
1088 Here's a small but realistic example:: |
|
1089 |
|
1090 import functools |
|
1091 |
|
1092 def log (message, subsystem): |
|
1093 "Write the contents of 'message' to the specified subsystem." |
|
1094 print '%s: %s' % (subsystem, message) |
|
1095 ... |
|
1096 |
|
1097 server_log = functools.partial(log, subsystem='server') |
|
1098 server_log('Unable to open socket') |
|
1099 |
|
1100 |
|
1101 The operator module |
|
1102 ------------------- |
|
1103 |
|
1104 The :mod:`operator` module was mentioned earlier. It contains a set of |
|
1105 functions corresponding to Python's operators. These functions are often useful |
|
1106 in functional-style code because they save you from writing trivial functions |
|
1107 that perform a single operation. |
|
1108 |
|
1109 Some of the functions in this module are: |
|
1110 |
|
1111 * Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``, |
|
1112 ``abs()``, ... |
|
1113 * Logical operations: ``not_()``, ``truth()``. |
|
1114 * Bitwise operations: ``and_()``, ``or_()``, ``invert()``. |
|
1115 * Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``. |
|
1116 * Object identity: ``is_()``, ``is_not()``. |
|
1117 |
|
1118 Consult the operator module's documentation for a complete list. |
|
1119 |
|
1120 |
|
1121 |
|
1122 The functional module |
|
1123 --------------------- |
|
1124 |
|
1125 Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__ |
|
1126 provides a number of more advanced tools for functional programming. It also |
|
1127 reimplements several Python built-ins, trying to make them more intuitive to |
|
1128 those used to functional programming in other languages. |
|
1129 |
|
1130 This section contains an introduction to some of the most important functions in |
|
1131 ``functional``; full documentation can be found at `the project's website |
|
1132 <http://oakwinter.com/code/functional/documentation/>`__. |
|
1133 |
|
1134 ``compose(outer, inner, unpack=False)`` |
|
1135 |
|
1136 The ``compose()`` function implements function composition. In other words, it |
|
1137 returns a wrapper around the ``outer`` and ``inner`` callables, such that the |
|
1138 return value from ``inner`` is fed directly to ``outer``. That is, :: |
|
1139 |
|
1140 >>> def add(a, b): |
|
1141 ... return a + b |
|
1142 ... |
|
1143 >>> def double(a): |
|
1144 ... return 2 * a |
|
1145 ... |
|
1146 >>> compose(double, add)(5, 6) |
|
1147 22 |
|
1148 |
|
1149 is equivalent to :: |
|
1150 |
|
1151 >>> double(add(5, 6)) |
|
1152 22 |
|
1153 |
|
1154 The ``unpack`` keyword is provided to work around the fact that Python functions |
|
1155 are not always `fully curried <http://en.wikipedia.org/wiki/Currying>`__. By |
|
1156 default, it is expected that the ``inner`` function will return a single object |
|
1157 and that the ``outer`` function will take a single argument. Setting the |
|
1158 ``unpack`` argument causes ``compose`` to expect a tuple from ``inner`` which |
|
1159 will be expanded before being passed to ``outer``. Put simply, :: |
|
1160 |
|
1161 compose(f, g)(5, 6) |
|
1162 |
|
1163 is equivalent to:: |
|
1164 |
|
1165 f(g(5, 6)) |
|
1166 |
|
1167 while :: |
|
1168 |
|
1169 compose(f, g, unpack=True)(5, 6) |
|
1170 |
|
1171 is equivalent to:: |
|
1172 |
|
1173 f(*g(5, 6)) |
|
1174 |
|
1175 Even though ``compose()`` only accepts two functions, it's trivial to build up a |
|
1176 version that will compose any number of functions. We'll use ``reduce()``, |
|
1177 ``compose()`` and ``partial()`` (the last of which is provided by both |
|
1178 ``functional`` and ``functools``). :: |
|
1179 |
|
1180 from functional import compose, partial |
|
1181 |
|
1182 multi_compose = partial(reduce, compose) |
|
1183 |
|
1184 |
|
1185 We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of |
|
1186 ``"".join(...)`` that converts its arguments to string:: |
|
1187 |
|
1188 from functional import compose, partial |
|
1189 |
|
1190 join = compose("".join, partial(map, str)) |
|
1191 |
|
1192 |
|
1193 ``flip(func)`` |
|
1194 |
|
1195 ``flip()`` wraps the callable in ``func`` and causes it to receive its |
|
1196 non-keyword arguments in reverse order. :: |
|
1197 |
|
1198 >>> def triple(a, b, c): |
|
1199 ... return (a, b, c) |
|
1200 ... |
|
1201 >>> triple(5, 6, 7) |
|
1202 (5, 6, 7) |
|
1203 >>> |
|
1204 >>> flipped_triple = flip(triple) |
|
1205 >>> flipped_triple(5, 6, 7) |
|
1206 (7, 6, 5) |
|
1207 |
|
1208 ``foldl(func, start, iterable)`` |
|
1209 |
|
1210 ``foldl()`` takes a binary function, a starting value (usually some kind of |
|
1211 'zero'), and an iterable. The function is applied to the starting value and the |
|
1212 first element of the list, then the result of that and the second element of the |
|
1213 list, then the result of that and the third element of the list, and so on. |
|
1214 |
|
1215 This means that a call such as:: |
|
1216 |
|
1217 foldl(f, 0, [1, 2, 3]) |
|
1218 |
|
1219 is equivalent to:: |
|
1220 |
|
1221 f(f(f(0, 1), 2), 3) |
|
1222 |
|
1223 |
|
1224 ``foldl()`` is roughly equivalent to the following recursive function:: |
|
1225 |
|
1226 def foldl(func, start, seq): |
|
1227 if len(seq) == 0: |
|
1228 return start |
|
1229 |
|
1230 return foldl(func, func(start, seq[0]), seq[1:]) |
|
1231 |
|
1232 Speaking of equivalence, the above ``foldl`` call can be expressed in terms of |
|
1233 the built-in ``reduce`` like so:: |
|
1234 |
|
1235 reduce(f, [1, 2, 3], 0) |
|
1236 |
|
1237 |
|
1238 We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to write a |
|
1239 cleaner, more aesthetically-pleasing version of Python's ``"".join(...)`` |
|
1240 idiom:: |
|
1241 |
|
1242 from functional import foldl, partial from operator import concat |
|
1243 |
|
1244 join = partial(foldl, concat, "") |
|
1245 |
|
1246 |
|
1247 Revision History and Acknowledgements |
|
1248 ===================================== |
|
1249 |
|
1250 The author would like to thank the following people for offering suggestions, |
|
1251 corrections and assistance with various drafts of this article: Ian Bicking, |
|
1252 Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro |
|
1253 Lameiro, Jussi Salmela, Collin Winter, Blake Winton. |
|
1254 |
|
1255 Version 0.1: posted June 30 2006. |
|
1256 |
|
1257 Version 0.11: posted July 1 2006. Typo fixes. |
|
1258 |
|
1259 Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one. |
|
1260 Typo fixes. |
|
1261 |
|
1262 Version 0.21: Added more references suggested on the tutor mailing list. |
|
1263 |
|
1264 Version 0.30: Adds a section on the ``functional`` module written by Collin |
|
1265 Winter; adds short section on the operator module; a few other edits. |
|
1266 |
|
1267 |
|
1268 References |
|
1269 ========== |
|
1270 |
|
1271 General |
|
1272 ------- |
|
1273 |
|
1274 **Structure and Interpretation of Computer Programs**, by Harold Abelson and |
|
1275 Gerald Jay Sussman with Julie Sussman. Full text at |
|
1276 http://mitpress.mit.edu/sicp/. In this classic textbook of computer science, |
|
1277 chapters 2 and 3 discuss the use of sequences and streams to organize the data |
|
1278 flow inside a program. The book uses Scheme for its examples, but many of the |
|
1279 design approaches described in these chapters are applicable to functional-style |
|
1280 Python code. |
|
1281 |
|
1282 http://www.defmacro.org/ramblings/fp.html: A general introduction to functional |
|
1283 programming that uses Java examples and has a lengthy historical introduction. |
|
1284 |
|
1285 http://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry |
|
1286 describing functional programming. |
|
1287 |
|
1288 http://en.wikipedia.org/wiki/Coroutine: Entry for coroutines. |
|
1289 |
|
1290 http://en.wikipedia.org/wiki/Currying: Entry for the concept of currying. |
|
1291 |
|
1292 Python-specific |
|
1293 --------------- |
|
1294 |
|
1295 http://gnosis.cx/TPiP/: The first chapter of David Mertz's book |
|
1296 :title-reference:`Text Processing in Python` discusses functional programming |
|
1297 for text processing, in the section titled "Utilizing Higher-Order Functions in |
|
1298 Text Processing". |
|
1299 |
|
1300 Mertz also wrote a 3-part series of articles on functional programming |
|
1301 for IBM's DeveloperWorks site; see |
|
1302 `part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__, |
|
1303 `part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and |
|
1304 `part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__, |
|
1305 |
|
1306 |
|
1307 Python documentation |
|
1308 -------------------- |
|
1309 |
|
1310 Documentation for the :mod:`itertools` module. |
|
1311 |
|
1312 Documentation for the :mod:`operator` module. |
|
1313 |
|
1314 :pep:`289`: "Generator Expressions" |
|
1315 |
|
1316 :pep:`342`: "Coroutines via Enhanced Generators" describes the new generator |
|
1317 features in Python 2.5. |
|
1318 |
|
1319 .. comment |
|
1320 |
|
1321 Topics to place |
|
1322 ----------------------------- |
|
1323 |
|
1324 XXX os.walk() |
|
1325 |
|
1326 XXX Need a large example. |
|
1327 |
|
1328 But will an example add much? I'll post a first draft and see |
|
1329 what the comments say. |
|
1330 |
|
1331 .. comment |
|
1332 |
|
1333 Original outline: |
|
1334 Introduction |
|
1335 Idea of FP |
|
1336 Programs built out of functions |
|
1337 Functions are strictly input-output, no internal state |
|
1338 Opposed to OO programming, where objects have state |
|
1339 |
|
1340 Why FP? |
|
1341 Formal provability |
|
1342 Assignment is difficult to reason about |
|
1343 Not very relevant to Python |
|
1344 Modularity |
|
1345 Small functions that do one thing |
|
1346 Debuggability: |
|
1347 Easy to test due to lack of state |
|
1348 Easy to verify output from intermediate steps |
|
1349 Composability |
|
1350 You assemble a toolbox of functions that can be mixed |
|
1351 |
|
1352 Tackling a problem |
|
1353 Need a significant example |
|
1354 |
|
1355 Iterators |
|
1356 Generators |
|
1357 The itertools module |
|
1358 List comprehensions |
|
1359 Small functions and the lambda statement |
|
1360 Built-in functions |
|
1361 map |
|
1362 filter |
|
1363 reduce |
|
1364 |
|
1365 .. comment |
|
1366 |
|
1367 Handy little function for printing part of an iterator -- used |
|
1368 while writing this document. |
|
1369 |
|
1370 import itertools |
|
1371 def print_iter(it): |
|
1372 slice = itertools.islice(it, 10) |
|
1373 for elem in slice[:-1]: |
|
1374 sys.stdout.write(str(elem)) |
|
1375 sys.stdout.write(', ') |
|
1376 print elem[-1] |
|
1377 |
|
1378 |