|
1 """ |
|
2 The regular expression engine in '_sre' can segfault when interpreting |
|
3 bogus bytecode. |
|
4 |
|
5 It is unclear whether this is a real bug or a "won't fix" case like |
|
6 bogus_code_obj.py, because it requires bytecode that is built by hand, |
|
7 as opposed to compiled by 're' from a string-source regexp. The |
|
8 difference with bogus_code_obj, though, is that the only existing regexp |
|
9 compiler is written in Python, so that the C code has no choice but |
|
10 accept arbitrary bytecode from Python-level. |
|
11 |
|
12 The test below builds and runs random bytecodes until 'match' crashes |
|
13 Python. I have not investigated why exactly segfaults occur nor how |
|
14 hard they would be to fix. Here are a few examples of 'code' that |
|
15 segfault for me: |
|
16 |
|
17 [21, 50814, 8, 29, 16] |
|
18 [21, 3967, 26, 10, 23, 54113] |
|
19 [29, 23, 0, 2, 5] |
|
20 [31, 64351, 0, 28, 3, 22281, 20, 4463, 9, 25, 59154, 15245, 2, |
|
21 16343, 3, 11600, 24380, 10, 37556, 10, 31, 15, 31] |
|
22 |
|
23 Here is also a 'code' that triggers an infinite uninterruptible loop: |
|
24 |
|
25 [29, 1, 8, 21, 1, 43083, 6] |
|
26 |
|
27 """ |
|
28 |
|
29 import _sre, random |
|
30 |
|
31 def pick(): |
|
32 n = random.randrange(-65536, 65536) |
|
33 if n < 0: |
|
34 n &= 31 |
|
35 return n |
|
36 |
|
37 ss = ["", "world", "x" * 500] |
|
38 |
|
39 while 1: |
|
40 code = [pick() for i in range(random.randrange(5, 25))] |
|
41 print code |
|
42 pat = _sre.compile(None, 0, code) |
|
43 for s in ss: |
|
44 try: |
|
45 pat.match(s) |
|
46 except RuntimeError: |
|
47 pass |