|
1 |
|
2 :mod:`multifile` --- Support for files containing distinct parts |
|
3 ================================================================ |
|
4 |
|
5 .. module:: multifile |
|
6 :synopsis: Support for reading files which contain distinct parts, such as some MIME data. |
|
7 :deprecated: |
|
8 .. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> |
|
9 |
|
10 |
|
11 .. deprecated:: 2.5 |
|
12 The :mod:`email` package should be used in preference to the :mod:`multifile` |
|
13 module. This module is present only to maintain backward compatibility. |
|
14 |
|
15 The :class:`MultiFile` object enables you to treat sections of a text file as |
|
16 file-like input objects, with ``''`` being returned by :meth:`readline` when a |
|
17 given delimiter pattern is encountered. The defaults of this class are designed |
|
18 to make it useful for parsing MIME multipart messages, but by subclassing it and |
|
19 overriding methods it can be easily adapted for more general use. |
|
20 |
|
21 |
|
22 .. class:: MultiFile(fp[, seekable]) |
|
23 |
|
24 Create a multi-file. You must instantiate this class with an input object |
|
25 argument for the :class:`MultiFile` instance to get lines from, such as a file |
|
26 object returned by :func:`open`. |
|
27 |
|
28 :class:`MultiFile` only ever looks at the input object's :meth:`readline`, |
|
29 :meth:`seek` and :meth:`tell` methods, and the latter two are only needed if you |
|
30 want random access to the individual MIME parts. To use :class:`MultiFile` on a |
|
31 non-seekable stream object, set the optional *seekable* argument to false; this |
|
32 will prevent using the input object's :meth:`seek` and :meth:`tell` methods. |
|
33 |
|
34 It will be useful to know that in :class:`MultiFile`'s view of the world, text |
|
35 is composed of three kinds of lines: data, section-dividers, and end-markers. |
|
36 MultiFile is designed to support parsing of messages that may have multiple |
|
37 nested message parts, each with its own pattern for section-divider and |
|
38 end-marker lines. |
|
39 |
|
40 |
|
41 .. seealso:: |
|
42 |
|
43 Module :mod:`email` |
|
44 Comprehensive email handling package; supersedes the :mod:`multifile` module. |
|
45 |
|
46 |
|
47 .. _multifile-objects: |
|
48 |
|
49 MultiFile Objects |
|
50 ----------------- |
|
51 |
|
52 A :class:`MultiFile` instance has the following methods: |
|
53 |
|
54 |
|
55 .. method:: MultiFile.readline(str) |
|
56 |
|
57 Read a line. If the line is data (not a section-divider or end-marker or real |
|
58 EOF) return it. If the line matches the most-recently-stacked boundary, return |
|
59 ``''`` and set ``self.last`` to 1 or 0 according as the match is or is not an |
|
60 end-marker. If the line matches any other stacked boundary, raise an error. On |
|
61 encountering end-of-file on the underlying stream object, the method raises |
|
62 :exc:`Error` unless all boundaries have been popped. |
|
63 |
|
64 |
|
65 .. method:: MultiFile.readlines(str) |
|
66 |
|
67 Return all lines remaining in this part as a list of strings. |
|
68 |
|
69 |
|
70 .. method:: MultiFile.read() |
|
71 |
|
72 Read all lines, up to the next section. Return them as a single (multiline) |
|
73 string. Note that this doesn't take a size argument! |
|
74 |
|
75 |
|
76 .. method:: MultiFile.seek(pos[, whence]) |
|
77 |
|
78 Seek. Seek indices are relative to the start of the current section. The *pos* |
|
79 and *whence* arguments are interpreted as for a file seek. |
|
80 |
|
81 |
|
82 .. method:: MultiFile.tell() |
|
83 |
|
84 Return the file position relative to the start of the current section. |
|
85 |
|
86 |
|
87 .. method:: MultiFile.next() |
|
88 |
|
89 Skip lines to the next section (that is, read lines until a section-divider or |
|
90 end-marker has been consumed). Return true if there is such a section, false if |
|
91 an end-marker is seen. Re-enable the most-recently-pushed boundary. |
|
92 |
|
93 |
|
94 .. method:: MultiFile.is_data(str) |
|
95 |
|
96 Return true if *str* is data and false if it might be a section boundary. As |
|
97 written, it tests for a prefix other than ``'-``\ ``-'`` at start of line (which |
|
98 all MIME boundaries have) but it is declared so it can be overridden in derived |
|
99 classes. |
|
100 |
|
101 Note that this test is used intended as a fast guard for the real boundary |
|
102 tests; if it always returns false it will merely slow processing, not cause it |
|
103 to fail. |
|
104 |
|
105 |
|
106 .. method:: MultiFile.push(str) |
|
107 |
|
108 Push a boundary string. When a decorated version of this boundary is found as |
|
109 an input line, it will be interpreted as a section-divider or end-marker |
|
110 (depending on the decoration, see :rfc:`2045`). All subsequent reads will |
|
111 return the empty string to indicate end-of-file, until a call to :meth:`pop` |
|
112 removes the boundary a or :meth:`next` call reenables it. |
|
113 |
|
114 It is possible to push more than one boundary. Encountering the |
|
115 most-recently-pushed boundary will return EOF; encountering any other |
|
116 boundary will raise an error. |
|
117 |
|
118 |
|
119 .. method:: MultiFile.pop() |
|
120 |
|
121 Pop a section boundary. This boundary will no longer be interpreted as EOF. |
|
122 |
|
123 |
|
124 .. method:: MultiFile.section_divider(str) |
|
125 |
|
126 Turn a boundary into a section-divider line. By default, this method |
|
127 prepends ``'--'`` (which MIME section boundaries have) but it is declared so |
|
128 it can be overridden in derived classes. This method need not append LF or |
|
129 CR-LF, as comparison with the result ignores trailing whitespace. |
|
130 |
|
131 |
|
132 .. method:: MultiFile.end_marker(str) |
|
133 |
|
134 Turn a boundary string into an end-marker line. By default, this method |
|
135 prepends ``'--'`` and appends ``'--'`` (like a MIME-multipart end-of-message |
|
136 marker) but it is declared so it can be overridden in derived classes. This |
|
137 method need not append LF or CR-LF, as comparison with the result ignores |
|
138 trailing whitespace. |
|
139 |
|
140 Finally, :class:`MultiFile` instances have two public instance variables: |
|
141 |
|
142 |
|
143 .. attribute:: MultiFile.level |
|
144 |
|
145 Nesting depth of the current part. |
|
146 |
|
147 |
|
148 .. attribute:: MultiFile.last |
|
149 |
|
150 True if the last end-of-file was for an end-of-message marker. |
|
151 |
|
152 |
|
153 .. _multifile-example: |
|
154 |
|
155 :class:`MultiFile` Example |
|
156 -------------------------- |
|
157 |
|
158 .. sectionauthor:: Skip Montanaro <skip@pobox.com> |
|
159 |
|
160 |
|
161 :: |
|
162 |
|
163 import mimetools |
|
164 import multifile |
|
165 import StringIO |
|
166 |
|
167 def extract_mime_part_matching(stream, mimetype): |
|
168 """Return the first element in a multipart MIME message on stream |
|
169 matching mimetype.""" |
|
170 |
|
171 msg = mimetools.Message(stream) |
|
172 msgtype = msg.gettype() |
|
173 params = msg.getplist() |
|
174 |
|
175 data = StringIO.StringIO() |
|
176 if msgtype[:10] == "multipart/": |
|
177 |
|
178 file = multifile.MultiFile(stream) |
|
179 file.push(msg.getparam("boundary")) |
|
180 while file.next(): |
|
181 submsg = mimetools.Message(file) |
|
182 try: |
|
183 data = StringIO.StringIO() |
|
184 mimetools.decode(file, data, submsg.getencoding()) |
|
185 except ValueError: |
|
186 continue |
|
187 if submsg.gettype() == mimetype: |
|
188 break |
|
189 file.pop() |
|
190 return data.getvalue() |
|
191 |