author | Simon Howkins <simonh@symbian.org> |
Mon, 15 Nov 2010 14:53:34 +0000 | |
branch | RCL_3 |
changeset 105 | 871af676edac |
parent 0 | dd21522fd290 |
permissions | -rw-r--r-- |
0
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
1 |
This file has advice to help you merge newer versions of PCRE. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
2 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
3 |
JavaScriptCore's PCRE is currently based on: |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
4 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
5 |
PCRE 6.4 |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
6 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
7 |
With the following differences. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
8 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
9 |
1) We added a PCRE_UTF16 define that makes a library that works on UTF-16 strings |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
10 |
rather than on ASCII or UTF-8. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
11 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
12 |
We introduced the public typedef pcre_char and the internal typedef pcre_uchar. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
13 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
14 |
We changed access to the digitab and ctypes arrays to range check and work only |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
15 |
on values in the 0-255 range. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
16 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
17 |
We changed GETCHAR, GETCHRATEST, GETCHARINC, GETCHARINCTEST, and GETCHARLEN |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
18 |
so they work on UTF-16. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
19 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
20 |
We added ISMIDCHAR to abstract the notion of characters to skip over, and |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
21 |
handle it right regardless of UTF-16 or UTF-8, and changed code to call it |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
22 |
when appropriate. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
23 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
24 |
We added GETUTF8CHARLEN and GETUTF8CHARINC, to be used in cases where we always |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
25 |
process UTF-8, even if the subject string is UTF-16, and changed code to call |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
26 |
them when appropriate. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
27 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
28 |
2) We added a JAVASCRIPT define that turns off and alters various features to match |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
29 |
the requirements of the JavaScript language specification. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
30 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
31 |
We removed these: |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
32 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
33 |
\C \E \G \L \N \P \Q \U \X \Z |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
34 |
\e \l \p \u \z |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
35 |
[::] [..] [==] |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
36 |
(?#) (?<=) (?<!) (?>) |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
37 |
(?C) (?P) (?R) |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
38 |
(?0) (and 1-9) |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
39 |
(?imsxUX) |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
40 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
41 |
And we added these: |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
42 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
43 |
\u \v |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
44 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
45 |
And we changed the semantics for \1-style backreferences to parentheses that |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
46 |
are not included in a match to match the empty string instead of not matching |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
47 |
anything: This is a difference between the JavaScript language specification and |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
48 |
the perl script. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
49 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
50 |
And we include ASCII 0x0B as a space. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
51 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
52 |
3) We made a more-efficient version of the NO_RECURSE mode that uses goto or computed |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
53 |
goto statements instead of setjmp/longjmp, since it's so much faster that way. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
54 |
We also allocated the first 16 stack frames on the stack instead of using malloc |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
55 |
every time; we use malloc for deeper nesting. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
56 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
57 |
This included adding a numeric parameter to the RMATCH macro. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
58 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
59 |
4) The original PCRE relied on having the input be a null-terminated string, |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
60 |
even though pcre_exec takes a length parameter. We removed that restriction, |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
61 |
passing additional parameters internally to make sure the code does not read |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
62 |
off the end of the input buffer. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
63 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
64 |
We added the macro GETCHARLENEND to be used in some places where GETCHARLEN |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
65 |
might otherwise walk off the end of the buffer. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
66 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
67 |
5) We added code to forbid values that are not Unicode characters from being used in |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
68 |
\x and \u escape sequences in regular expressions. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
69 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
70 |
6) We changed the names of the public entry points to have a kjs prefix so they don't |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
71 |
collide with a "real" copy of PCRE at link or load time. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
72 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
73 |
7) We added a hand-edited pcre-config.h, which is used instead of a configure-generated |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
74 |
config.h file. Note, this is made from the config.h.in from the PCRE distribution. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
75 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
76 |
8) We eliminated non-ASCII characters from the source files (they were used only |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
77 |
in one or two places). |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
78 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
79 |
9) We removed many unused source files. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
80 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
81 |
10) We marked some additional global data tables const. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
82 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
83 |
11) And we fixed some compiler warnings. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
84 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
85 |
For easy merging: |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
86 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
87 |
1) We look for approaches that minimize changes to the base PCRE code. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
88 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
89 |
2) When making global changes we leave code alone that we're not compiling. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
90 |
So code that's inside #if !JAVASCRIPT need not have the other changes above. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
91 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
92 |
This can be a bit strange. For example, there's a choice about what to do with |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
93 |
the code to handle an end of pattern pointer or length rather than a trailing |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
94 |
zero. Our strategy is to not make enhancements to the code that we're not |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
95 |
compiling, so if you turned off the JAVASCRIPT flag, you'd find that the |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
96 |
range checking changes are incomplete. This is solely to aid merging. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
97 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
98 |
3) We are willing to format code strangely to minimize the differences from |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
99 |
the base PCRE code. |
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
100 |
|
dd21522fd290
Revision: 200911
Kiiskinen Klaus (Nokia-D-MSW/Tampere) <klaus.kiiskinen@nokia.com>
parents:
diff
changeset
|
101 |
Differences from the base PCRE code should be viewed with these comments in mind. |