downloadkit/BeautifulSoup.py
author Simon Howkins <simonh@symbian.org>
Thu, 13 May 2010 16:27:37 +0100
changeset 244 2251fde91223
parent 124 b60a149520e7
permissions -rw-r--r--
Changed script to use CSV formatted input, rather than TSV. This means that the script can directly process the CSV downloaded from Bugzilla, without any need to use Excel to convert it.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
124
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     1
"""Beautiful Soup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     2
Elixir and Tonic
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     3
"The Screen-Scraper's Friend"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     4
http://www.crummy.com/software/BeautifulSoup/
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     5
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     6
Beautiful Soup parses a (possibly invalid) XML or HTML document into a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     7
tree representation. It provides methods and Pythonic idioms that make
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     8
it easy to navigate, search, and modify the tree.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
     9
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    10
A well-formed XML/HTML document yields a well-formed data
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    11
structure. An ill-formed XML/HTML document yields a correspondingly
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    12
ill-formed data structure. If your document is only locally
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    13
well-formed, you can use this library to find and process the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    14
well-formed part of it.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    15
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    16
Beautiful Soup works with Python 2.2 and up. It has no external
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    17
dependencies, but you'll have more success at converting data to UTF-8
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    18
if you also install these three packages:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    19
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    20
* chardet, for auto-detecting character encodings
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    21
  http://chardet.feedparser.org/
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    22
* cjkcodecs and iconv_codec, which add more encodings to the ones supported
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    23
  by stock Python.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    24
  http://cjkpython.i18n.org/
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    25
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    26
Beautiful Soup defines classes for two main parsing strategies:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    27
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    28
 * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    29
   language that kind of looks like XML.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    30
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    31
 * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    32
   or invalid. This class has web browser-like heuristics for
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    33
   obtaining a sensible parse tree in the face of common HTML errors.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    34
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    35
Beautiful Soup also defines a class (UnicodeDammit) for autodetecting
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    36
the encoding of an HTML or XML document, and converting it to
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    37
Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed Parser.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    38
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    39
For more than you ever wanted to know about Beautiful Soup, see the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    40
documentation:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    41
http://www.crummy.com/software/BeautifulSoup/documentation.html
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    42
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    43
Here, have some legalese:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    44
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    45
Copyright (c) 2004-2009, Leonard Richardson
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    46
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    47
All rights reserved.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    48
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    49
Redistribution and use in source and binary forms, with or without
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    50
modification, are permitted provided that the following conditions are
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    51
met:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    52
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    53
  * Redistributions of source code must retain the above copyright
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    54
    notice, this list of conditions and the following disclaimer.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    55
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    56
  * Redistributions in binary form must reproduce the above
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    57
    copyright notice, this list of conditions and the following
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    58
    disclaimer in the documentation and/or other materials provided
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    59
    with the distribution.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    60
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    61
  * Neither the name of the the Beautiful Soup Consortium and All
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    62
    Night Kosher Bakery nor the names of its contributors may be
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    63
    used to endorse or promote products derived from this software
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    64
    without specific prior written permission.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    65
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    66
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    67
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    68
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    69
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    70
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    71
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    72
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    73
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    74
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    75
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    76
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE, DAMMIT.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    77
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    78
"""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    79
from __future__ import generators
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    80
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    81
__author__ = "Leonard Richardson (leonardr@segfault.org)"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    82
__version__ = "3.1.0.1"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    83
__copyright__ = "Copyright (c) 2004-2009 Leonard Richardson"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    84
__license__ = "New-style BSD"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    85
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    86
import codecs
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    87
import markupbase
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    88
import types
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    89
import re
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    90
from HTMLParser import HTMLParser, HTMLParseError
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    91
try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    92
    from htmlentitydefs import name2codepoint
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    93
except ImportError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    94
    name2codepoint = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    95
try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    96
    set
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    97
except NameError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    98
    from sets import Set as set
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
    99
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   100
#These hacks make Beautiful Soup able to parse XML with namespaces
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   101
markupbase._declname_match = re.compile(r'[a-zA-Z][-_.:a-zA-Z0-9]*\s*').match
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   102
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   103
DEFAULT_OUTPUT_ENCODING = "utf-8"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   104
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   105
# First, the classes that represent markup elements.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   106
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   107
def sob(unicode, encoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   108
    """Returns either the given Unicode string or its encoding."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   109
    if encoding is None:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   110
        return unicode
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   111
    else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   112
        return unicode.encode(encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   113
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   114
class PageElement:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   115
    """Contains the navigational information for some part of the page
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   116
    (either a tag or a piece of text)"""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   117
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   118
    def setup(self, parent=None, previous=None):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   119
        """Sets up the initial relations between this element and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   120
        other elements."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   121
        self.parent = parent
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   122
        self.previous = previous
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   123
        self.next = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   124
        self.previousSibling = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   125
        self.nextSibling = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   126
        if self.parent and self.parent.contents:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   127
            self.previousSibling = self.parent.contents[-1]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   128
            self.previousSibling.nextSibling = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   129
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   130
    def replaceWith(self, replaceWith):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   131
        oldParent = self.parent
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   132
        myIndex = self.parent.contents.index(self)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   133
        if hasattr(replaceWith, 'parent') and replaceWith.parent == self.parent:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   134
            # We're replacing this element with one of its siblings.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   135
            index = self.parent.contents.index(replaceWith)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   136
            if index and index < myIndex:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   137
                # Furthermore, it comes before this element. That
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   138
                # means that when we extract it, the index of this
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   139
                # element will change.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   140
                myIndex = myIndex - 1
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   141
        self.extract()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   142
        oldParent.insert(myIndex, replaceWith)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   143
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   144
    def extract(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   145
        """Destructively rips this element out of the tree."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   146
        if self.parent:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   147
            try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   148
                self.parent.contents.remove(self)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   149
            except ValueError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   150
                pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   151
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   152
        #Find the two elements that would be next to each other if
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   153
        #this element (and any children) hadn't been parsed. Connect
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   154
        #the two.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   155
        lastChild = self._lastRecursiveChild()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   156
        nextElement = lastChild.next
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   157
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   158
        if self.previous:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   159
            self.previous.next = nextElement
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   160
        if nextElement:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   161
            nextElement.previous = self.previous
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   162
        self.previous = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   163
        lastChild.next = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   164
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   165
        self.parent = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   166
        if self.previousSibling:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   167
            self.previousSibling.nextSibling = self.nextSibling
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   168
        if self.nextSibling:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   169
            self.nextSibling.previousSibling = self.previousSibling
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   170
        self.previousSibling = self.nextSibling = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   171
        return self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   172
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   173
    def _lastRecursiveChild(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   174
        "Finds the last element beneath this object to be parsed."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   175
        lastChild = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   176
        while hasattr(lastChild, 'contents') and lastChild.contents:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   177
            lastChild = lastChild.contents[-1]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   178
        return lastChild
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   179
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   180
    def insert(self, position, newChild):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   181
        if (isinstance(newChild, basestring)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   182
            or isinstance(newChild, unicode)) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   183
            and not isinstance(newChild, NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   184
            newChild = NavigableString(newChild)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   185
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   186
        position =  min(position, len(self.contents))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   187
        if hasattr(newChild, 'parent') and newChild.parent != None:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   188
            # We're 'inserting' an element that's already one
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   189
            # of this object's children.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   190
            if newChild.parent == self:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   191
                index = self.find(newChild)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   192
                if index and index < position:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   193
                    # Furthermore we're moving it further down the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   194
                    # list of this object's children. That means that
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   195
                    # when we extract this element, our target index
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   196
                    # will jump down one.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   197
                    position = position - 1
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   198
            newChild.extract()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   199
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   200
        newChild.parent = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   201
        previousChild = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   202
        if position == 0:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   203
            newChild.previousSibling = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   204
            newChild.previous = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   205
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   206
            previousChild = self.contents[position-1]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   207
            newChild.previousSibling = previousChild
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   208
            newChild.previousSibling.nextSibling = newChild
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   209
            newChild.previous = previousChild._lastRecursiveChild()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   210
        if newChild.previous:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   211
            newChild.previous.next = newChild
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   212
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   213
        newChildsLastElement = newChild._lastRecursiveChild()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   214
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   215
        if position >= len(self.contents):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   216
            newChild.nextSibling = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   217
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   218
            parent = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   219
            parentsNextSibling = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   220
            while not parentsNextSibling:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   221
                parentsNextSibling = parent.nextSibling
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   222
                parent = parent.parent
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   223
                if not parent: # This is the last element in the document.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   224
                    break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   225
            if parentsNextSibling:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   226
                newChildsLastElement.next = parentsNextSibling
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   227
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   228
                newChildsLastElement.next = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   229
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   230
            nextChild = self.contents[position]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   231
            newChild.nextSibling = nextChild
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   232
            if newChild.nextSibling:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   233
                newChild.nextSibling.previousSibling = newChild
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   234
            newChildsLastElement.next = nextChild
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   235
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   236
        if newChildsLastElement.next:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   237
            newChildsLastElement.next.previous = newChildsLastElement
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   238
        self.contents.insert(position, newChild)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   239
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   240
    def append(self, tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   241
        """Appends the given tag to the contents of this tag."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   242
        self.insert(len(self.contents), tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   243
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   244
    def findNext(self, name=None, attrs={}, text=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   245
        """Returns the first item that matches the given criteria and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   246
        appears after this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   247
        return self._findOne(self.findAllNext, name, attrs, text, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   248
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   249
    def findAllNext(self, name=None, attrs={}, text=None, limit=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   250
                    **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   251
        """Returns all items that match the given criteria and appear
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   252
        after this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   253
        return self._findAll(name, attrs, text, limit, self.nextGenerator,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   254
                             **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   255
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   256
    def findNextSibling(self, name=None, attrs={}, text=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   257
        """Returns the closest sibling to this Tag that matches the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   258
        given criteria and appears after this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   259
        return self._findOne(self.findNextSiblings, name, attrs, text,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   260
                             **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   261
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   262
    def findNextSiblings(self, name=None, attrs={}, text=None, limit=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   263
                         **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   264
        """Returns the siblings of this Tag that match the given
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   265
        criteria and appear after this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   266
        return self._findAll(name, attrs, text, limit,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   267
                             self.nextSiblingGenerator, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   268
    fetchNextSiblings = findNextSiblings # Compatibility with pre-3.x
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   269
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   270
    def findPrevious(self, name=None, attrs={}, text=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   271
        """Returns the first item that matches the given criteria and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   272
        appears before this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   273
        return self._findOne(self.findAllPrevious, name, attrs, text, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   274
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   275
    def findAllPrevious(self, name=None, attrs={}, text=None, limit=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   276
                        **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   277
        """Returns all items that match the given criteria and appear
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   278
        before this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   279
        return self._findAll(name, attrs, text, limit, self.previousGenerator,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   280
                           **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   281
    fetchPrevious = findAllPrevious # Compatibility with pre-3.x
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   282
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   283
    def findPreviousSibling(self, name=None, attrs={}, text=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   284
        """Returns the closest sibling to this Tag that matches the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   285
        given criteria and appears before this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   286
        return self._findOne(self.findPreviousSiblings, name, attrs, text,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   287
                             **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   288
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   289
    def findPreviousSiblings(self, name=None, attrs={}, text=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   290
                             limit=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   291
        """Returns the siblings of this Tag that match the given
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   292
        criteria and appear before this Tag in the document."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   293
        return self._findAll(name, attrs, text, limit,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   294
                             self.previousSiblingGenerator, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   295
    fetchPreviousSiblings = findPreviousSiblings # Compatibility with pre-3.x
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   296
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   297
    def findParent(self, name=None, attrs={}, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   298
        """Returns the closest parent of this Tag that matches the given
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   299
        criteria."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   300
        # NOTE: We can't use _findOne because findParents takes a different
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   301
        # set of arguments.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   302
        r = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   303
        l = self.findParents(name, attrs, 1)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   304
        if l:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   305
            r = l[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   306
        return r
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   307
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   308
    def findParents(self, name=None, attrs={}, limit=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   309
        """Returns the parents of this Tag that match the given
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   310
        criteria."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   311
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   312
        return self._findAll(name, attrs, None, limit, self.parentGenerator,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   313
                             **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   314
    fetchParents = findParents # Compatibility with pre-3.x
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   315
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   316
    #These methods do the real heavy lifting.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   317
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   318
    def _findOne(self, method, name, attrs, text, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   319
        r = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   320
        l = method(name, attrs, text, 1, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   321
        if l:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   322
            r = l[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   323
        return r
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   324
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   325
    def _findAll(self, name, attrs, text, limit, generator, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   326
        "Iterates over a generator looking for things that match."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   327
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   328
        if isinstance(name, SoupStrainer):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   329
            strainer = name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   330
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   331
            # Build a SoupStrainer
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   332
            strainer = SoupStrainer(name, attrs, text, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   333
        results = ResultSet(strainer)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   334
        g = generator()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   335
        while True:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   336
            try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   337
                i = g.next()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   338
            except StopIteration:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   339
                break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   340
            if i:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   341
                found = strainer.search(i)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   342
                if found:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   343
                    results.append(found)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   344
                    if limit and len(results) >= limit:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   345
                        break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   346
        return results
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   347
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   348
    #These Generators can be used to navigate starting from both
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   349
    #NavigableStrings and Tags.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   350
    def nextGenerator(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   351
        i = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   352
        while i:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   353
            i = i.next
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   354
            yield i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   355
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   356
    def nextSiblingGenerator(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   357
        i = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   358
        while i:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   359
            i = i.nextSibling
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   360
            yield i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   361
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   362
    def previousGenerator(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   363
        i = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   364
        while i:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   365
            i = i.previous
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   366
            yield i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   367
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   368
    def previousSiblingGenerator(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   369
        i = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   370
        while i:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   371
            i = i.previousSibling
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   372
            yield i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   373
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   374
    def parentGenerator(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   375
        i = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   376
        while i:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   377
            i = i.parent
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   378
            yield i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   379
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   380
    # Utility methods
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   381
    def substituteEncoding(self, str, encoding=None):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   382
        encoding = encoding or "utf-8"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   383
        return str.replace("%SOUP-ENCODING%", encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   384
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   385
    def toEncoding(self, s, encoding=None):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   386
        """Encodes an object to a string in some encoding, or to Unicode.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   387
        ."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   388
        if isinstance(s, unicode):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   389
            if encoding:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   390
                s = s.encode(encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   391
        elif isinstance(s, str):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   392
            if encoding:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   393
                s = s.encode(encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   394
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   395
                s = unicode(s)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   396
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   397
            if encoding:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   398
                s  = self.toEncoding(str(s), encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   399
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   400
                s = unicode(s)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   401
        return s
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   402
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   403
class NavigableString(unicode, PageElement):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   404
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   405
    def __new__(cls, value):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   406
        """Create a new NavigableString.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   407
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   408
        When unpickling a NavigableString, this method is called with
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   409
        the string in DEFAULT_OUTPUT_ENCODING. That encoding needs to be
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   410
        passed in to the superclass's __new__ or the superclass won't know
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   411
        how to handle non-ASCII characters.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   412
        """
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   413
        if isinstance(value, unicode):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   414
            return unicode.__new__(cls, value)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   415
        return unicode.__new__(cls, value, DEFAULT_OUTPUT_ENCODING)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   416
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   417
    def __getnewargs__(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   418
        return (unicode(self),)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   419
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   420
    def __getattr__(self, attr):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   421
        """text.string gives you text. This is for backwards
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   422
        compatibility for Navigable*String, but for CData* it lets you
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   423
        get the string without the CData wrapper."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   424
        if attr == 'string':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   425
            return self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   426
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   427
            raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   428
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   429
    def encode(self, encoding=DEFAULT_OUTPUT_ENCODING):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   430
        return self.decode().encode(encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   431
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   432
    def decodeGivenEventualEncoding(self, eventualEncoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   433
        return self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   434
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   435
class CData(NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   436
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   437
    def decodeGivenEventualEncoding(self, eventualEncoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   438
        return u'<![CDATA[' + self + u']]>'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   439
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   440
class ProcessingInstruction(NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   441
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   442
    def decodeGivenEventualEncoding(self, eventualEncoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   443
        output = self
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   444
        if u'%SOUP-ENCODING%' in output:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   445
            output = self.substituteEncoding(output, eventualEncoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   446
        return u'<?' + output + u'?>'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   447
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   448
class Comment(NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   449
    def decodeGivenEventualEncoding(self, eventualEncoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   450
        return u'<!--' + self + u'-->'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   451
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   452
class Declaration(NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   453
    def decodeGivenEventualEncoding(self, eventualEncoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   454
        return u'<!' + self + u'>'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   455
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   456
class Tag(PageElement):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   457
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   458
    """Represents a found HTML tag with its attributes and contents."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   459
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   460
    def _invert(h):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   461
        "Cheap function to invert a hash."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   462
        i = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   463
        for k,v in h.items():
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   464
            i[v] = k
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   465
        return i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   466
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   467
    XML_ENTITIES_TO_SPECIAL_CHARS = { "apos" : "'",
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   468
                                      "quot" : '"',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   469
                                      "amp" : "&",
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   470
                                      "lt" : "<",
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   471
                                      "gt" : ">" }
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   472
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   473
    XML_SPECIAL_CHARS_TO_ENTITIES = _invert(XML_ENTITIES_TO_SPECIAL_CHARS)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   474
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   475
    def _convertEntities(self, match):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   476
        """Used in a call to re.sub to replace HTML, XML, and numeric
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   477
        entities with the appropriate Unicode characters. If HTML
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   478
        entities are being converted, any unrecognized entities are
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   479
        escaped."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   480
        x = match.group(1)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   481
        if self.convertHTMLEntities and x in name2codepoint:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   482
            return unichr(name2codepoint[x])
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   483
        elif x in self.XML_ENTITIES_TO_SPECIAL_CHARS:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   484
            if self.convertXMLEntities:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   485
                return self.XML_ENTITIES_TO_SPECIAL_CHARS[x]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   486
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   487
                return u'&%s;' % x
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   488
        elif len(x) > 0 and x[0] == '#':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   489
            # Handle numeric entities
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   490
            if len(x) > 1 and x[1] == 'x':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   491
                return unichr(int(x[2:], 16))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   492
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   493
                return unichr(int(x[1:]))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   494
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   495
        elif self.escapeUnrecognizedEntities:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   496
            return u'&amp;%s;' % x
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   497
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   498
            return u'&%s;' % x
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   499
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   500
    def __init__(self, parser, name, attrs=None, parent=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   501
                 previous=None):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   502
        "Basic constructor."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   503
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   504
        # We don't actually store the parser object: that lets extracted
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   505
        # chunks be garbage-collected
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   506
        self.parserClass = parser.__class__
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   507
        self.isSelfClosing = parser.isSelfClosingTag(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   508
        self.name = name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   509
        if attrs == None:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   510
            attrs = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   511
        self.attrs = attrs
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   512
        self.contents = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   513
        self.setup(parent, previous)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   514
        self.hidden = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   515
        self.containsSubstitutions = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   516
        self.convertHTMLEntities = parser.convertHTMLEntities
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   517
        self.convertXMLEntities = parser.convertXMLEntities
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   518
        self.escapeUnrecognizedEntities = parser.escapeUnrecognizedEntities
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   519
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   520
        def convert(kval):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   521
            "Converts HTML, XML and numeric entities in the attribute value."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   522
            k, val = kval
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   523
            if val is None:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   524
                return kval
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   525
            return (k, re.sub("&(#\d+|#x[0-9a-fA-F]+|\w+);",
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   526
                              self._convertEntities, val))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   527
        self.attrs = map(convert, self.attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   528
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   529
    def get(self, key, default=None):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   530
        """Returns the value of the 'key' attribute for the tag, or
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   531
        the value given for 'default' if it doesn't have that
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   532
        attribute."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   533
        return self._getAttrMap().get(key, default)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   534
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   535
    def has_key(self, key):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   536
        return self._getAttrMap().has_key(key)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   537
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   538
    def __getitem__(self, key):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   539
        """tag[key] returns the value of the 'key' attribute for the tag,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   540
        and throws an exception if it's not there."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   541
        return self._getAttrMap()[key]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   542
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   543
    def __iter__(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   544
        "Iterating over a tag iterates over its contents."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   545
        return iter(self.contents)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   546
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   547
    def __len__(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   548
        "The length of a tag is the length of its list of contents."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   549
        return len(self.contents)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   550
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   551
    def __contains__(self, x):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   552
        return x in self.contents
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   553
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   554
    def __nonzero__(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   555
        "A tag is non-None even if it has no contents."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   556
        return True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   557
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   558
    def __setitem__(self, key, value):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   559
        """Setting tag[key] sets the value of the 'key' attribute for the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   560
        tag."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   561
        self._getAttrMap()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   562
        self.attrMap[key] = value
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   563
        found = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   564
        for i in range(0, len(self.attrs)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   565
            if self.attrs[i][0] == key:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   566
                self.attrs[i] = (key, value)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   567
                found = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   568
        if not found:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   569
            self.attrs.append((key, value))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   570
        self._getAttrMap()[key] = value
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   571
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   572
    def __delitem__(self, key):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   573
        "Deleting tag[key] deletes all 'key' attributes for the tag."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   574
        for item in self.attrs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   575
            if item[0] == key:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   576
                self.attrs.remove(item)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   577
                #We don't break because bad HTML can define the same
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   578
                #attribute multiple times.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   579
            self._getAttrMap()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   580
            if self.attrMap.has_key(key):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   581
                del self.attrMap[key]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   582
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   583
    def __call__(self, *args, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   584
        """Calling a tag like a function is the same as calling its
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   585
        findAll() method. Eg. tag('a') returns a list of all the A tags
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   586
        found within this tag."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   587
        return apply(self.findAll, args, kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   588
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   589
    def __getattr__(self, tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   590
        #print "Getattr %s.%s" % (self.__class__, tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   591
        if len(tag) > 3 and tag.rfind('Tag') == len(tag)-3:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   592
            return self.find(tag[:-3])
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   593
        elif tag.find('__') != 0:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   594
            return self.find(tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   595
        raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__, tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   596
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   597
    def __eq__(self, other):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   598
        """Returns true iff this tag has the same name, the same attributes,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   599
        and the same contents (recursively) as the given tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   600
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   601
        NOTE: right now this will return false if two tags have the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   602
        same attributes in a different order. Should this be fixed?"""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   603
        if not hasattr(other, 'name') or not hasattr(other, 'attrs') or not hasattr(other, 'contents') or self.name != other.name or self.attrs != other.attrs or len(self) != len(other):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   604
            return False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   605
        for i in range(0, len(self.contents)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   606
            if self.contents[i] != other.contents[i]:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   607
                return False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   608
        return True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   609
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   610
    def __ne__(self, other):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   611
        """Returns true iff this tag is not identical to the other tag,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   612
        as defined in __eq__."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   613
        return not self == other
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   614
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   615
    def __repr__(self, encoding=DEFAULT_OUTPUT_ENCODING):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   616
        """Renders this tag as a string."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   617
        return self.decode(eventualEncoding=encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   618
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   619
    BARE_AMPERSAND_OR_BRACKET = re.compile("([<>]|"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   620
                                           + "&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   621
                                           + ")")
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   622
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   623
    def _sub_entity(self, x):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   624
        """Used with a regular expression to substitute the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   625
        appropriate XML entity for an XML special character."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   626
        return "&" + self.XML_SPECIAL_CHARS_TO_ENTITIES[x.group(0)[0]] + ";"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   627
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   628
    def __unicode__(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   629
        return self.decode()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   630
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   631
    def __str__(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   632
        return self.encode()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   633
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   634
    def encode(self, encoding=DEFAULT_OUTPUT_ENCODING,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   635
               prettyPrint=False, indentLevel=0):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   636
        return self.decode(prettyPrint, indentLevel, encoding).encode(encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   637
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   638
    def decode(self, prettyPrint=False, indentLevel=0,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   639
               eventualEncoding=DEFAULT_OUTPUT_ENCODING):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   640
        """Returns a string or Unicode representation of this tag and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   641
        its contents. To get Unicode, pass None for encoding."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   642
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   643
        attrs = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   644
        if self.attrs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   645
            for key, val in self.attrs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   646
                fmt = '%s="%s"'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   647
                if isString(val):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   648
                    if (self.containsSubstitutions
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   649
                        and eventualEncoding is not None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   650
                        and '%SOUP-ENCODING%' in val):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   651
                        val = self.substituteEncoding(val, eventualEncoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   652
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   653
                    # The attribute value either:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   654
                    #
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   655
                    # * Contains no embedded double quotes or single quotes.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   656
                    #   No problem: we enclose it in double quotes.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   657
                    # * Contains embedded single quotes. No problem:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   658
                    #   double quotes work here too.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   659
                    # * Contains embedded double quotes. No problem:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   660
                    #   we enclose it in single quotes.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   661
                    # * Embeds both single _and_ double quotes. This
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   662
                    #   can't happen naturally, but it can happen if
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   663
                    #   you modify an attribute value after parsing
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   664
                    #   the document. Now we have a bit of a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   665
                    #   problem. We solve it by enclosing the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   666
                    #   attribute in single quotes, and escaping any
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   667
                    #   embedded single quotes to XML entities.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   668
                    if '"' in val:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   669
                        fmt = "%s='%s'"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   670
                        if "'" in val:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   671
                            # TODO: replace with apos when
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   672
                            # appropriate.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   673
                            val = val.replace("'", "&squot;")
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   674
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   675
                    # Now we're okay w/r/t quotes. But the attribute
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   676
                    # value might also contain angle brackets, or
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   677
                    # ampersands that aren't part of entities. We need
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   678
                    # to escape those to XML entities too.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   679
                    val = self.BARE_AMPERSAND_OR_BRACKET.sub(self._sub_entity, val)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   680
                if val is None:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   681
                    # Handle boolean attributes.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   682
                    decoded = key
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   683
                else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   684
                    decoded = fmt % (key, val)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   685
                attrs.append(decoded)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   686
        close = ''
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   687
        closeTag = ''
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   688
        if self.isSelfClosing:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   689
            close = ' /'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   690
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   691
            closeTag = '</%s>' % self.name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   692
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   693
        indentTag, indentContents = 0, 0
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   694
        if prettyPrint:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   695
            indentTag = indentLevel
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   696
            space = (' ' * (indentTag-1))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   697
            indentContents = indentTag + 1
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   698
        contents = self.decodeContents(prettyPrint, indentContents,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   699
                                       eventualEncoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   700
        if self.hidden:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   701
            s = contents
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   702
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   703
            s = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   704
            attributeString = ''
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   705
            if attrs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   706
                attributeString = ' ' + ' '.join(attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   707
            if prettyPrint:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   708
                s.append(space)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   709
            s.append('<%s%s%s>' % (self.name, attributeString, close))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   710
            if prettyPrint:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   711
                s.append("\n")
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   712
            s.append(contents)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   713
            if prettyPrint and contents and contents[-1] != "\n":
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   714
                s.append("\n")
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   715
            if prettyPrint and closeTag:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   716
                s.append(space)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   717
            s.append(closeTag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   718
            if prettyPrint and closeTag and self.nextSibling:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   719
                s.append("\n")
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   720
            s = ''.join(s)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   721
        return s
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   722
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   723
    def decompose(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   724
        """Recursively destroys the contents of this tree."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   725
        contents = [i for i in self.contents]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   726
        for i in contents:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   727
            if isinstance(i, Tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   728
                i.decompose()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   729
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   730
                i.extract()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   731
        self.extract()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   732
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   733
    def prettify(self, encoding=DEFAULT_OUTPUT_ENCODING):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   734
        return self.encode(encoding, True)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   735
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   736
    def encodeContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   737
                       prettyPrint=False, indentLevel=0):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   738
        return self.decodeContents(prettyPrint, indentLevel).encode(encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   739
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   740
    def decodeContents(self, prettyPrint=False, indentLevel=0,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   741
                       eventualEncoding=DEFAULT_OUTPUT_ENCODING):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   742
        """Renders the contents of this tag as a string in the given
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   743
        encoding. If encoding is None, returns a Unicode string.."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   744
        s=[]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   745
        for c in self:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   746
            text = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   747
            if isinstance(c, NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   748
                text = c.decodeGivenEventualEncoding(eventualEncoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   749
            elif isinstance(c, Tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   750
                s.append(c.decode(prettyPrint, indentLevel, eventualEncoding))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   751
            if text and prettyPrint:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   752
                text = text.strip()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   753
            if text:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   754
                if prettyPrint:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   755
                    s.append(" " * (indentLevel-1))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   756
                s.append(text)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   757
                if prettyPrint:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   758
                    s.append("\n")
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   759
        return ''.join(s)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   760
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   761
    #Soup methods
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   762
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   763
    def find(self, name=None, attrs={}, recursive=True, text=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   764
             **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   765
        """Return only the first child of this Tag matching the given
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   766
        criteria."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   767
        r = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   768
        l = self.findAll(name, attrs, recursive, text, 1, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   769
        if l:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   770
            r = l[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   771
        return r
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   772
    findChild = find
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   773
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   774
    def findAll(self, name=None, attrs={}, recursive=True, text=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   775
                limit=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   776
        """Extracts a list of Tag objects that match the given
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   777
        criteria.  You can specify the name of the Tag and any
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   778
        attributes you want the Tag to have.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   779
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   780
        The value of a key-value pair in the 'attrs' map can be a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   781
        string, a list of strings, a regular expression object, or a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   782
        callable that takes a string and returns whether or not the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   783
        string matches for some custom definition of 'matches'. The
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   784
        same is true of the tag name."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   785
        generator = self.recursiveChildGenerator
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   786
        if not recursive:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   787
            generator = self.childGenerator
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   788
        return self._findAll(name, attrs, text, limit, generator, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   789
    findChildren = findAll
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   790
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   791
    # Pre-3.x compatibility methods. Will go away in 4.0.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   792
    first = find
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   793
    fetch = findAll
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   794
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   795
    def fetchText(self, text=None, recursive=True, limit=None):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   796
        return self.findAll(text=text, recursive=recursive, limit=limit)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   797
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   798
    def firstText(self, text=None, recursive=True):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   799
        return self.find(text=text, recursive=recursive)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   800
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   801
    # 3.x compatibility methods. Will go away in 4.0.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   802
    def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   803
                       prettyPrint=False, indentLevel=0):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   804
        if encoding is None:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   805
            return self.decodeContents(prettyPrint, indentLevel, encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   806
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   807
            return self.encodeContents(encoding, prettyPrint, indentLevel)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   808
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   809
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   810
    #Private methods
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   811
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   812
    def _getAttrMap(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   813
        """Initializes a map representation of this tag's attributes,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   814
        if not already initialized."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   815
        if not getattr(self, 'attrMap'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   816
            self.attrMap = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   817
            for (key, value) in self.attrs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   818
                self.attrMap[key] = value
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   819
        return self.attrMap
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   820
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   821
    #Generator methods
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   822
    def recursiveChildGenerator(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   823
        if not len(self.contents):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   824
            raise StopIteration
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   825
        stopNode = self._lastRecursiveChild().next
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   826
        current = self.contents[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   827
        while current is not stopNode:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   828
            yield current
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   829
            current = current.next
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   830
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   831
    def childGenerator(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   832
        if not len(self.contents):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   833
            raise StopIteration
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   834
        current = self.contents[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   835
        while current:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   836
            yield current
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   837
            current = current.nextSibling
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   838
        raise StopIteration
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   839
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   840
# Next, a couple classes to represent queries and their results.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   841
class SoupStrainer:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   842
    """Encapsulates a number of ways of matching a markup element (tag or
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   843
    text)."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   844
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   845
    def __init__(self, name=None, attrs={}, text=None, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   846
        self.name = name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   847
        if isString(attrs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   848
            kwargs['class'] = attrs
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   849
            attrs = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   850
        if kwargs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   851
            if attrs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   852
                attrs = attrs.copy()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   853
                attrs.update(kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   854
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   855
                attrs = kwargs
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   856
        self.attrs = attrs
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   857
        self.text = text
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   858
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   859
    def __str__(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   860
        if self.text:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   861
            return self.text
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   862
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   863
            return "%s|%s" % (self.name, self.attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   864
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   865
    def searchTag(self, markupName=None, markupAttrs={}):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   866
        found = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   867
        markup = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   868
        if isinstance(markupName, Tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   869
            markup = markupName
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   870
            markupAttrs = markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   871
        callFunctionWithTagData = callable(self.name) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   872
                                and not isinstance(markupName, Tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   873
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   874
        if (not self.name) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   875
               or callFunctionWithTagData \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   876
               or (markup and self._matches(markup, self.name)) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   877
               or (not markup and self._matches(markupName, self.name)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   878
            if callFunctionWithTagData:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   879
                match = self.name(markupName, markupAttrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   880
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   881
                match = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   882
                markupAttrMap = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   883
                for attr, matchAgainst in self.attrs.items():
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   884
                    if not markupAttrMap:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   885
                         if hasattr(markupAttrs, 'get'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   886
                            markupAttrMap = markupAttrs
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   887
                         else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   888
                            markupAttrMap = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   889
                            for k,v in markupAttrs:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   890
                                markupAttrMap[k] = v
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   891
                    attrValue = markupAttrMap.get(attr)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   892
                    if not self._matches(attrValue, matchAgainst):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   893
                        match = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   894
                        break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   895
            if match:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   896
                if markup:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   897
                    found = markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   898
                else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   899
                    found = markupName
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   900
        return found
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   901
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   902
    def search(self, markup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   903
        #print 'looking for %s in %s' % (self, markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   904
        found = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   905
        # If given a list of items, scan it for a text element that
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   906
        # matches.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   907
        if isList(markup) and not isinstance(markup, Tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   908
            for element in markup:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   909
                if isinstance(element, NavigableString) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   910
                       and self.search(element):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   911
                    found = element
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   912
                    break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   913
        # If it's a Tag, make sure its name or attributes match.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   914
        # Don't bother with Tags if we're searching for text.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   915
        elif isinstance(markup, Tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   916
            if not self.text:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   917
                found = self.searchTag(markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   918
        # If it's text, make sure the text matches.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   919
        elif isinstance(markup, NavigableString) or \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   920
                 isString(markup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   921
            if self._matches(markup, self.text):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   922
                found = markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   923
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   924
            raise Exception, "I don't know how to match against a %s" \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   925
                  % markup.__class__
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   926
        return found
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   927
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   928
    def _matches(self, markup, matchAgainst):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   929
        #print "Matching %s against %s" % (markup, matchAgainst)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   930
        result = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   931
        if matchAgainst == True and type(matchAgainst) == types.BooleanType:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   932
            result = markup != None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   933
        elif callable(matchAgainst):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   934
            result = matchAgainst(markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   935
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   936
            #Custom match methods take the tag as an argument, but all
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   937
            #other ways of matching match the tag name as a string.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   938
            if isinstance(markup, Tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   939
                markup = markup.name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   940
            if markup is not None and not isString(markup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   941
                markup = unicode(markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   942
            #Now we know that chunk is either a string, or None.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   943
            if hasattr(matchAgainst, 'match'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   944
                # It's a regexp object.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   945
                result = markup and matchAgainst.search(markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   946
            elif (isList(matchAgainst)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   947
                  and (markup is not None or not isString(matchAgainst))):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   948
                result = markup in matchAgainst
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   949
            elif hasattr(matchAgainst, 'items'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   950
                result = markup.has_key(matchAgainst)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   951
            elif matchAgainst and isString(markup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   952
                if isinstance(markup, unicode):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   953
                    matchAgainst = unicode(matchAgainst)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   954
                else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   955
                    matchAgainst = str(matchAgainst)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   956
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   957
            if not result:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   958
                result = matchAgainst == markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   959
        return result
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   960
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   961
class ResultSet(list):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   962
    """A ResultSet is just a list that keeps track of the SoupStrainer
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   963
    that created it."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   964
    def __init__(self, source):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   965
        list.__init__([])
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   966
        self.source = source
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   967
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   968
# Now, some helper functions.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   969
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   970
def isList(l):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   971
    """Convenience method that works with all 2.x versions of Python
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   972
    to determine whether or not something is listlike."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   973
    return ((hasattr(l, '__iter__') and not isString(l))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   974
            or (type(l) in (types.ListType, types.TupleType)))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   975
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   976
def isString(s):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   977
    """Convenience method that works with all 2.x versions of Python
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   978
    to determine whether or not something is stringlike."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   979
    try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   980
        return isinstance(s, unicode) or isinstance(s, basestring)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   981
    except NameError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   982
        return isinstance(s, str)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   983
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   984
def buildTagMap(default, *args):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   985
    """Turns a list of maps, lists, or scalars into a single map.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   986
    Used to build the SELF_CLOSING_TAGS, NESTABLE_TAGS, and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   987
    NESTING_RESET_TAGS maps out of lists and partial maps."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   988
    built = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   989
    for portion in args:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   990
        if hasattr(portion, 'items'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   991
            #It's a map. Merge it.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   992
            for k,v in portion.items():
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   993
                built[k] = v
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   994
        elif isList(portion) and not isString(portion):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   995
            #It's a list. Map each item to the default.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   996
            for k in portion:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   997
                built[k] = default
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   998
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
   999
            #It's a scalar. Map it to the default.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1000
            built[portion] = default
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1001
    return built
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1002
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1003
# Now, the parser classes.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1004
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1005
class HTMLParserBuilder(HTMLParser):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1006
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1007
    def __init__(self, soup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1008
        HTMLParser.__init__(self)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1009
        self.soup = soup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1010
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1011
    # We inherit feed() and reset().
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1012
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1013
    def handle_starttag(self, name, attrs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1014
        if name == 'meta':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1015
            self.soup.extractCharsetFromMeta(attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1016
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1017
            self.soup.unknown_starttag(name, attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1018
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1019
    def handle_endtag(self, name):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1020
        self.soup.unknown_endtag(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1021
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1022
    def handle_data(self, content):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1023
        self.soup.handle_data(content)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1024
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1025
    def _toStringSubclass(self, text, subclass):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1026
        """Adds a certain piece of text to the tree as a NavigableString
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1027
        subclass."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1028
        self.soup.endData()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1029
        self.handle_data(text)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1030
        self.soup.endData(subclass)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1031
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1032
    def handle_pi(self, text):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1033
        """Handle a processing instruction as a ProcessingInstruction
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1034
        object, possibly one with a %SOUP-ENCODING% slot into which an
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1035
        encoding will be plugged later."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1036
        if text[:3] == "xml":
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1037
            text = u"xml version='1.0' encoding='%SOUP-ENCODING%'"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1038
        self._toStringSubclass(text, ProcessingInstruction)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1039
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1040
    def handle_comment(self, text):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1041
        "Handle comments as Comment objects."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1042
        self._toStringSubclass(text, Comment)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1043
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1044
    def handle_charref(self, ref):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1045
        "Handle character references as data."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1046
        if self.soup.convertEntities:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1047
            data = unichr(int(ref))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1048
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1049
            data = '&#%s;' % ref
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1050
        self.handle_data(data)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1051
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1052
    def handle_entityref(self, ref):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1053
        """Handle entity references as data, possibly converting known
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1054
        HTML and/or XML entity references to the corresponding Unicode
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1055
        characters."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1056
        data = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1057
        if self.soup.convertHTMLEntities:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1058
            try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1059
                data = unichr(name2codepoint[ref])
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1060
            except KeyError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1061
                pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1062
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1063
        if not data and self.soup.convertXMLEntities:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1064
                data = self.soup.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1065
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1066
        if not data and self.soup.convertHTMLEntities and \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1067
            not self.soup.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1068
                # TODO: We've got a problem here. We're told this is
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1069
                # an entity reference, but it's not an XML entity
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1070
                # reference or an HTML entity reference. Nonetheless,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1071
                # the logical thing to do is to pass it through as an
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1072
                # unrecognized entity reference.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1073
                #
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1074
                # Except: when the input is "&carol;" this function
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1075
                # will be called with input "carol". When the input is
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1076
                # "AT&T", this function will be called with input
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1077
                # "T". We have no way of knowing whether a semicolon
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1078
                # was present originally, so we don't know whether
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1079
                # this is an unknown entity or just a misplaced
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1080
                # ampersand.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1081
                #
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1082
                # The more common case is a misplaced ampersand, so I
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1083
                # escape the ampersand and omit the trailing semicolon.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1084
                data = "&amp;%s" % ref
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1085
        if not data:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1086
            # This case is different from the one above, because we
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1087
            # haven't already gone through a supposedly comprehensive
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1088
            # mapping of entities to Unicode characters. We might not
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1089
            # have gone through any mapping at all. So the chances are
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1090
            # very high that this is a real entity, and not a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1091
            # misplaced ampersand.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1092
            data = "&%s;" % ref
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1093
        self.handle_data(data)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1094
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1095
    def handle_decl(self, data):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1096
        "Handle DOCTYPEs and the like as Declaration objects."
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1097
        self._toStringSubclass(data, Declaration)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1098
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1099
    def parse_declaration(self, i):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1100
        """Treat a bogus SGML declaration as raw data. Treat a CDATA
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1101
        declaration as a CData object."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1102
        j = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1103
        if self.rawdata[i:i+9] == '<![CDATA[':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1104
             k = self.rawdata.find(']]>', i)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1105
             if k == -1:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1106
                 k = len(self.rawdata)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1107
             data = self.rawdata[i+9:k]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1108
             j = k+3
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1109
             self._toStringSubclass(data, CData)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1110
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1111
            try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1112
                j = HTMLParser.parse_declaration(self, i)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1113
            except HTMLParseError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1114
                toHandle = self.rawdata[i:]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1115
                self.handle_data(toHandle)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1116
                j = i + len(toHandle)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1117
        return j
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1118
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1119
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1120
class BeautifulStoneSoup(Tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1121
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1122
    """This class contains the basic parser and search code. It defines
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1123
    a parser that knows nothing about tag behavior except for the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1124
    following:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1125
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1126
      You can't close a tag without closing all the tags it encloses.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1127
      That is, "<foo><bar></foo>" actually means
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1128
      "<foo><bar></bar></foo>".
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1129
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1130
    [Another possible explanation is "<foo><bar /></foo>", but since
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1131
    this class defines no SELF_CLOSING_TAGS, it will never use that
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1132
    explanation.]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1133
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1134
    This class is useful for parsing XML or made-up markup languages,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1135
    or when BeautifulSoup makes an assumption counter to what you were
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1136
    expecting."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1137
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1138
    SELF_CLOSING_TAGS = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1139
    NESTABLE_TAGS = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1140
    RESET_NESTING_TAGS = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1141
    QUOTE_TAGS = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1142
    PRESERVE_WHITESPACE_TAGS = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1143
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1144
    MARKUP_MASSAGE = [(re.compile('(<[^<>]*)/>'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1145
                       lambda x: x.group(1) + ' />'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1146
                      (re.compile('<!\s+([^<>]*)>'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1147
                       lambda x: '<!' + x.group(1) + '>')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1148
                      ]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1149
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1150
    ROOT_TAG_NAME = u'[document]'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1151
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1152
    HTML_ENTITIES = "html"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1153
    XML_ENTITIES = "xml"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1154
    XHTML_ENTITIES = "xhtml"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1155
    # TODO: This only exists for backwards-compatibility
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1156
    ALL_ENTITIES = XHTML_ENTITIES
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1157
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1158
    # Used when determining whether a text node is all whitespace and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1159
    # can be replaced with a single space. A text node that contains
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1160
    # fancy Unicode spaces (usually non-breaking) should be left
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1161
    # alone.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1162
    STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, }
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1163
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1164
    def __init__(self, markup="", parseOnlyThese=None, fromEncoding=None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1165
                 markupMassage=True, smartQuotesTo=XML_ENTITIES,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1166
                 convertEntities=None, selfClosingTags=None, isHTML=False,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1167
                 builder=HTMLParserBuilder):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1168
        """The Soup object is initialized as the 'root tag', and the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1169
        provided markup (which can be a string or a file-like object)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1170
        is fed into the underlying parser.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1171
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1172
        HTMLParser will process most bad HTML, and the BeautifulSoup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1173
        class has some tricks for dealing with some HTML that kills
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1174
        HTMLParser, but Beautiful Soup can nonetheless choke or lose data
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1175
        if your data uses self-closing tags or declarations
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1176
        incorrectly.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1177
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1178
        By default, Beautiful Soup uses regexes to sanitize input,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1179
        avoiding the vast majority of these problems. If the problems
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1180
        don't apply to you, pass in False for markupMassage, and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1181
        you'll get better performance.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1182
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1183
        The default parser massage techniques fix the two most common
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1184
        instances of invalid HTML that choke HTMLParser:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1185
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1186
         <br/> (No space between name of closing tag and tag close)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1187
         <! --Comment--> (Extraneous whitespace in declaration)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1188
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1189
        You can pass in a custom list of (RE object, replace method)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1190
        tuples to get Beautiful Soup to scrub your input the way you
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1191
        want."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1192
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1193
        self.parseOnlyThese = parseOnlyThese
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1194
        self.fromEncoding = fromEncoding
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1195
        self.smartQuotesTo = smartQuotesTo
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1196
        self.convertEntities = convertEntities
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1197
        # Set the rules for how we'll deal with the entities we
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1198
        # encounter
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1199
        if self.convertEntities:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1200
            # It doesn't make sense to convert encoded characters to
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1201
            # entities even while you're converting entities to Unicode.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1202
            # Just convert it all to Unicode.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1203
            self.smartQuotesTo = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1204
            if convertEntities == self.HTML_ENTITIES:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1205
                self.convertXMLEntities = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1206
                self.convertHTMLEntities = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1207
                self.escapeUnrecognizedEntities = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1208
            elif convertEntities == self.XHTML_ENTITIES:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1209
                self.convertXMLEntities = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1210
                self.convertHTMLEntities = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1211
                self.escapeUnrecognizedEntities = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1212
            elif convertEntities == self.XML_ENTITIES:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1213
                self.convertXMLEntities = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1214
                self.convertHTMLEntities = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1215
                self.escapeUnrecognizedEntities = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1216
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1217
            self.convertXMLEntities = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1218
            self.convertHTMLEntities = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1219
            self.escapeUnrecognizedEntities = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1220
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1221
        self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1222
        self.builder = builder(self)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1223
        self.reset()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1224
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1225
        if hasattr(markup, 'read'):        # It's a file-type object.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1226
            markup = markup.read()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1227
        self.markup = markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1228
        self.markupMassage = markupMassage
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1229
        try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1230
            self._feed(isHTML=isHTML)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1231
        except StopParsing:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1232
            pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1233
        self.markup = None                 # The markup can now be GCed.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1234
        self.builder = None                # So can the builder.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1235
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1236
    def _feed(self, inDocumentEncoding=None, isHTML=False):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1237
        # Convert the document to Unicode.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1238
        markup = self.markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1239
        if isinstance(markup, unicode):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1240
            if not hasattr(self, 'originalEncoding'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1241
                self.originalEncoding = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1242
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1243
            dammit = UnicodeDammit\
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1244
                     (markup, [self.fromEncoding, inDocumentEncoding],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1245
                      smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1246
            markup = dammit.unicode
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1247
            self.originalEncoding = dammit.originalEncoding
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1248
            self.declaredHTMLEncoding = dammit.declaredHTMLEncoding
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1249
        if markup:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1250
            if self.markupMassage:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1251
                if not isList(self.markupMassage):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1252
                    self.markupMassage = self.MARKUP_MASSAGE
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1253
                for fix, m in self.markupMassage:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1254
                    markup = fix.sub(m, markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1255
                # TODO: We get rid of markupMassage so that the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1256
                # soup object can be deepcopied later on. Some
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1257
                # Python installations can't copy regexes. If anyone
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1258
                # was relying on the existence of markupMassage, this
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1259
                # might cause problems.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1260
                del(self.markupMassage)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1261
        self.builder.reset()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1262
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1263
        self.builder.feed(markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1264
        # Close out any unfinished strings and close all the open tags.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1265
        self.endData()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1266
        while self.currentTag.name != self.ROOT_TAG_NAME:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1267
            self.popTag()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1268
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1269
    def isSelfClosingTag(self, name):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1270
        """Returns true iff the given string is the name of a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1271
        self-closing tag according to this parser."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1272
        return self.SELF_CLOSING_TAGS.has_key(name) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1273
               or self.instanceSelfClosingTags.has_key(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1274
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1275
    def reset(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1276
        Tag.__init__(self, self, self.ROOT_TAG_NAME)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1277
        self.hidden = 1
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1278
        self.builder.reset()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1279
        self.currentData = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1280
        self.currentTag = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1281
        self.tagStack = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1282
        self.quoteStack = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1283
        self.pushTag(self)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1284
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1285
    def popTag(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1286
        tag = self.tagStack.pop()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1287
        # Tags with just one string-owning child get the child as a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1288
        # 'string' property, so that soup.tag.string is shorthand for
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1289
        # soup.tag.contents[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1290
        if len(self.currentTag.contents) == 1 and \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1291
           isinstance(self.currentTag.contents[0], NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1292
            self.currentTag.string = self.currentTag.contents[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1293
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1294
        #print "Pop", tag.name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1295
        if self.tagStack:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1296
            self.currentTag = self.tagStack[-1]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1297
        return self.currentTag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1298
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1299
    def pushTag(self, tag):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1300
        #print "Push", tag.name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1301
        if self.currentTag:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1302
            self.currentTag.contents.append(tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1303
        self.tagStack.append(tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1304
        self.currentTag = self.tagStack[-1]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1305
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1306
    def endData(self, containerClass=NavigableString):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1307
        if self.currentData:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1308
            currentData = u''.join(self.currentData)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1309
            if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1310
                not set([tag.name for tag in self.tagStack]).intersection(
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1311
                    self.PRESERVE_WHITESPACE_TAGS)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1312
                if '\n' in currentData:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1313
                    currentData = '\n'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1314
                else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1315
                    currentData = ' '
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1316
            self.currentData = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1317
            if self.parseOnlyThese and len(self.tagStack) <= 1 and \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1318
                   (not self.parseOnlyThese.text or \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1319
                    not self.parseOnlyThese.search(currentData)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1320
                return
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1321
            o = containerClass(currentData)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1322
            o.setup(self.currentTag, self.previous)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1323
            if self.previous:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1324
                self.previous.next = o
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1325
            self.previous = o
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1326
            self.currentTag.contents.append(o)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1327
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1328
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1329
    def _popToTag(self, name, inclusivePop=True):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1330
        """Pops the tag stack up to and including the most recent
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1331
        instance of the given tag. If inclusivePop is false, pops the tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1332
        stack up to but *not* including the most recent instqance of
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1333
        the given tag."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1334
        #print "Popping to %s" % name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1335
        if name == self.ROOT_TAG_NAME:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1336
            return
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1337
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1338
        numPops = 0
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1339
        mostRecentTag = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1340
        for i in range(len(self.tagStack)-1, 0, -1):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1341
            if name == self.tagStack[i].name:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1342
                numPops = len(self.tagStack)-i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1343
                break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1344
        if not inclusivePop:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1345
            numPops = numPops - 1
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1346
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1347
        for i in range(0, numPops):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1348
            mostRecentTag = self.popTag()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1349
        return mostRecentTag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1350
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1351
    def _smartPop(self, name):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1352
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1353
        """We need to pop up to the previous tag of this type, unless
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1354
        one of this tag's nesting reset triggers comes between this
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1355
        tag and the previous tag of this type, OR unless this tag is a
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1356
        generic nesting trigger and another generic nesting trigger
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1357
        comes between this tag and the previous tag of this type.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1358
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1359
        Examples:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1360
         <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1361
         <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1362
         <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1363
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1364
         <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1365
         <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1366
         <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1367
        """
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1368
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1369
        nestingResetTriggers = self.NESTABLE_TAGS.get(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1370
        isNestable = nestingResetTriggers != None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1371
        isResetNesting = self.RESET_NESTING_TAGS.has_key(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1372
        popTo = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1373
        inclusive = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1374
        for i in range(len(self.tagStack)-1, 0, -1):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1375
            p = self.tagStack[i]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1376
            if (not p or p.name == name) and not isNestable:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1377
                #Non-nestable tags get popped to the top or to their
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1378
                #last occurance.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1379
                popTo = name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1380
                break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1381
            if (nestingResetTriggers != None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1382
                and p.name in nestingResetTriggers) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1383
                or (nestingResetTriggers == None and isResetNesting
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1384
                    and self.RESET_NESTING_TAGS.has_key(p.name)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1385
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1386
                #If we encounter one of the nesting reset triggers
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1387
                #peculiar to this tag, or we encounter another tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1388
                #that causes nesting to reset, pop up to but not
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1389
                #including that tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1390
                popTo = p.name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1391
                inclusive = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1392
                break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1393
            p = p.parent
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1394
        if popTo:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1395
            self._popToTag(popTo, inclusive)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1396
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1397
    def unknown_starttag(self, name, attrs, selfClosing=0):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1398
        #print "Start tag %s: %s" % (name, attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1399
        if self.quoteStack:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1400
            #This is not a real tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1401
            #print "<%s> is not real!" % name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1402
            attrs = ''.join(map(lambda(x, y): ' %s="%s"' % (x, y), attrs))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1403
            self.handle_data('<%s%s>' % (name, attrs))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1404
            return
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1405
        self.endData()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1406
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1407
        if not self.isSelfClosingTag(name) and not selfClosing:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1408
            self._smartPop(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1409
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1410
        if self.parseOnlyThese and len(self.tagStack) <= 1 \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1411
               and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1412
            return
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1413
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1414
        tag = Tag(self, name, attrs, self.currentTag, self.previous)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1415
        if self.previous:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1416
            self.previous.next = tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1417
        self.previous = tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1418
        self.pushTag(tag)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1419
        if selfClosing or self.isSelfClosingTag(name):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1420
            self.popTag()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1421
        if name in self.QUOTE_TAGS:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1422
            #print "Beginning quote (%s)" % name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1423
            self.quoteStack.append(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1424
            self.literal = 1
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1425
        return tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1426
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1427
    def unknown_endtag(self, name):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1428
        #print "End tag %s" % name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1429
        if self.quoteStack and self.quoteStack[-1] != name:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1430
            #This is not a real end tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1431
            #print "</%s> is not real!" % name
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1432
            self.handle_data('</%s>' % name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1433
            return
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1434
        self.endData()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1435
        self._popToTag(name)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1436
        if self.quoteStack and self.quoteStack[-1] == name:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1437
            self.quoteStack.pop()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1438
            self.literal = (len(self.quoteStack) > 0)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1439
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1440
    def handle_data(self, data):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1441
        self.currentData.append(data)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1442
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1443
    def extractCharsetFromMeta(self, attrs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1444
        self.unknown_starttag('meta', attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1445
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1446
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1447
class BeautifulSoup(BeautifulStoneSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1448
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1449
    """This parser knows the following facts about HTML:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1450
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1451
    * Some tags have no closing tag and should be interpreted as being
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1452
      closed as soon as they are encountered.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1453
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1454
    * The text inside some tags (ie. 'script') may contain tags which
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1455
      are not really part of the document and which should be parsed
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1456
      as text, not tags. If you want to parse the text as tags, you can
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1457
      always fetch it and parse it explicitly.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1458
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1459
    * Tag nesting rules:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1460
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1461
      Most tags can't be nested at all. For instance, the occurance of
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1462
      a <p> tag should implicitly close the previous <p> tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1463
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1464
       <p>Para1<p>Para2
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1465
        should be transformed into:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1466
       <p>Para1</p><p>Para2
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1467
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1468
      Some tags can be nested arbitrarily. For instance, the occurance
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1469
      of a <blockquote> tag should _not_ implicitly close the previous
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1470
      <blockquote> tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1471
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1472
       Alice said: <blockquote>Bob said: <blockquote>Blah
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1473
        should NOT be transformed into:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1474
       Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1475
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1476
      Some tags can be nested, but the nesting is reset by the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1477
      interposition of other tags. For instance, a <tr> tag should
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1478
      implicitly close the previous <tr> tag within the same <table>,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1479
      but not close a <tr> tag in another table.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1480
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1481
       <table><tr>Blah<tr>Blah
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1482
        should be transformed into:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1483
       <table><tr>Blah</tr><tr>Blah
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1484
        but,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1485
       <tr>Blah<table><tr>Blah
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1486
        should NOT be transformed into
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1487
       <tr>Blah<table></tr><tr>Blah
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1488
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1489
    Differing assumptions about tag nesting rules are a major source
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1490
    of problems with the BeautifulSoup class. If BeautifulSoup is not
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1491
    treating as nestable a tag your page author treats as nestable,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1492
    try ICantBelieveItsBeautifulSoup, MinimalSoup, or
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1493
    BeautifulStoneSoup before writing your own subclass."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1494
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1495
    def __init__(self, *args, **kwargs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1496
        if not kwargs.has_key('smartQuotesTo'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1497
            kwargs['smartQuotesTo'] = self.HTML_ENTITIES
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1498
        kwargs['isHTML'] = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1499
        BeautifulStoneSoup.__init__(self, *args, **kwargs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1500
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1501
    SELF_CLOSING_TAGS = buildTagMap(None,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1502
                                    ['br' , 'hr', 'input', 'img', 'meta',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1503
                                    'spacer', 'link', 'frame', 'base'])
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1504
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1505
    PRESERVE_WHITESPACE_TAGS = set(['pre', 'textarea'])
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1506
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1507
    QUOTE_TAGS = {'script' : None, 'textarea' : None}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1508
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1509
    #According to the HTML standard, each of these inline tags can
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1510
    #contain another tag of the same type. Furthermore, it's common
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1511
    #to actually use these tags this way.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1512
    NESTABLE_INLINE_TAGS = ['span', 'font', 'q', 'object', 'bdo', 'sub', 'sup',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1513
                            'center']
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1514
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1515
    #According to the HTML standard, these block tags can contain
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1516
    #another tag of the same type. Furthermore, it's common
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1517
    #to actually use these tags this way.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1518
    NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del']
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1519
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1520
    #Lists can contain other lists, but there are restrictions.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1521
    NESTABLE_LIST_TAGS = { 'ol' : [],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1522
                           'ul' : [],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1523
                           'li' : ['ul', 'ol'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1524
                           'dl' : [],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1525
                           'dd' : ['dl'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1526
                           'dt' : ['dl'] }
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1527
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1528
    #Tables can contain other tables, but there are restrictions.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1529
    NESTABLE_TABLE_TAGS = {'table' : [],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1530
                           'tr' : ['table', 'tbody', 'tfoot', 'thead'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1531
                           'td' : ['tr'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1532
                           'th' : ['tr'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1533
                           'thead' : ['table'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1534
                           'tbody' : ['table'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1535
                           'tfoot' : ['table'],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1536
                           }
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1537
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1538
    NON_NESTABLE_BLOCK_TAGS = ['address', 'form', 'p', 'pre']
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1539
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1540
    #If one of these tags is encountered, all tags up to the next tag of
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1541
    #this type are popped.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1542
    RESET_NESTING_TAGS = buildTagMap(None, NESTABLE_BLOCK_TAGS, 'noscript',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1543
                                     NON_NESTABLE_BLOCK_TAGS,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1544
                                     NESTABLE_LIST_TAGS,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1545
                                     NESTABLE_TABLE_TAGS)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1546
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1547
    NESTABLE_TAGS = buildTagMap([], NESTABLE_INLINE_TAGS, NESTABLE_BLOCK_TAGS,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1548
                                NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1549
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1550
    # Used to detect the charset in a META tag; see start_meta
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1551
    CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)", re.M)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1552
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1553
    def extractCharsetFromMeta(self, attrs):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1554
        """Beautiful Soup can detect a charset included in a META tag,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1555
        try to convert the document to that charset, and re-parse the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1556
        document from the beginning."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1557
        httpEquiv = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1558
        contentType = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1559
        contentTypeIndex = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1560
        tagNeedsEncodingSubstitution = False
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1561
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1562
        for i in range(0, len(attrs)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1563
            key, value = attrs[i]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1564
            key = key.lower()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1565
            if key == 'http-equiv':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1566
                httpEquiv = value
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1567
            elif key == 'content':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1568
                contentType = value
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1569
                contentTypeIndex = i
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1570
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1571
        if httpEquiv and contentType: # It's an interesting meta tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1572
            match = self.CHARSET_RE.search(contentType)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1573
            if match:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1574
                if (self.declaredHTMLEncoding is not None or
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1575
                    self.originalEncoding == self.fromEncoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1576
                    # An HTML encoding was sniffed while converting
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1577
                    # the document to Unicode, or an HTML encoding was
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1578
                    # sniffed during a previous pass through the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1579
                    # document, or an encoding was specified
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1580
                    # explicitly and it worked. Rewrite the meta tag.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1581
                    def rewrite(match):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1582
                        return match.group(1) + "%SOUP-ENCODING%"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1583
                    newAttr = self.CHARSET_RE.sub(rewrite, contentType)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1584
                    attrs[contentTypeIndex] = (attrs[contentTypeIndex][0],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1585
                                               newAttr)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1586
                    tagNeedsEncodingSubstitution = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1587
                else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1588
                    # This is our first pass through the document.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1589
                    # Go through it again with the encoding information.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1590
                    newCharset = match.group(3)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1591
                    if newCharset and newCharset != self.originalEncoding:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1592
                        self.declaredHTMLEncoding = newCharset
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1593
                        self._feed(self.declaredHTMLEncoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1594
                        raise StopParsing
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1595
                    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1596
        tag = self.unknown_starttag("meta", attrs)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1597
        if tag and tagNeedsEncodingSubstitution:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1598
            tag.containsSubstitutions = True
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1599
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1600
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1601
class StopParsing(Exception):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1602
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1603
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1604
class ICantBelieveItsBeautifulSoup(BeautifulSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1605
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1606
    """The BeautifulSoup class is oriented towards skipping over
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1607
    common HTML errors like unclosed tags. However, sometimes it makes
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1608
    errors of its own. For instance, consider this fragment:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1609
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1610
     <b>Foo<b>Bar</b></b>
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1611
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1612
    This is perfectly valid (if bizarre) HTML. However, the
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1613
    BeautifulSoup class will implicitly close the first b tag when it
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1614
    encounters the second 'b'. It will think the author wrote
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1615
    "<b>Foo<b>Bar", and didn't close the first 'b' tag, because
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1616
    there's no real-world reason to bold something that's already
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1617
    bold. When it encounters '</b></b>' it will close two more 'b'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1618
    tags, for a grand total of three tags closed instead of two. This
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1619
    can throw off the rest of your document structure. The same is
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1620
    true of a number of other tags, listed below.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1621
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1622
    It's much more common for someone to forget to close a 'b' tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1623
    than to actually use nested 'b' tags, and the BeautifulSoup class
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1624
    handles the common case. This class handles the not-co-common
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1625
    case: where you can't believe someone wrote what they did, but
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1626
    it's valid HTML and BeautifulSoup screwed up by assuming it
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1627
    wouldn't be."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1628
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1629
    I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS = \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1630
     ['em', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'strong',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1631
      'cite', 'code', 'dfn', 'kbd', 'samp', 'strong', 'var', 'b',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1632
      'big']
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1633
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1634
    I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS = ['noscript']
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1635
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1636
    NESTABLE_TAGS = buildTagMap([], BeautifulSoup.NESTABLE_TAGS,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1637
                                I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1638
                                I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1639
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1640
class MinimalSoup(BeautifulSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1641
    """The MinimalSoup class is for parsing HTML that contains
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1642
    pathologically bad markup. It makes no assumptions about tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1643
    nesting, but it does know which tags are self-closing, that
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1644
    <script> tags contain Javascript and should not be parsed, that
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1645
    META tags may contain encoding information, and so on.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1646
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1647
    This also makes it better for subclassing than BeautifulStoneSoup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1648
    or BeautifulSoup."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1649
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1650
    RESET_NESTING_TAGS = buildTagMap('noscript')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1651
    NESTABLE_TAGS = {}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1652
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1653
class BeautifulSOAP(BeautifulStoneSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1654
    """This class will push a tag with only a single string child into
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1655
    the tag's parent as an attribute. The attribute's name is the tag
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1656
    name, and the value is the string child. An example should give
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1657
    the flavor of the change:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1658
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1659
    <foo><bar>baz</bar></foo>
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1660
     =>
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1661
    <foo bar="baz"><bar>baz</bar></foo>
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1662
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1663
    You can then access fooTag['bar'] instead of fooTag.barTag.string.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1664
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1665
    This is, of course, useful for scraping structures that tend to
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1666
    use subelements instead of attributes, such as SOAP messages. Note
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1667
    that it modifies its input, so don't print the modified version
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1668
    out.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1669
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1670
    I'm not sure how many people really want to use this class; let me
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1671
    know if you do. Mainly I like the name."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1672
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1673
    def popTag(self):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1674
        if len(self.tagStack) > 1:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1675
            tag = self.tagStack[-1]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1676
            parent = self.tagStack[-2]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1677
            parent._getAttrMap()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1678
            if (isinstance(tag, Tag) and len(tag.contents) == 1 and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1679
                isinstance(tag.contents[0], NavigableString) and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1680
                not parent.attrMap.has_key(tag.name)):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1681
                parent[tag.name] = tag.contents[0]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1682
        BeautifulStoneSoup.popTag(self)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1683
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1684
#Enterprise class names! It has come to our attention that some people
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1685
#think the names of the Beautiful Soup parser classes are too silly
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1686
#and "unprofessional" for use in enterprise screen-scraping. We feel
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1687
#your pain! For such-minded folk, the Beautiful Soup Consortium And
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1688
#All-Night Kosher Bakery recommends renaming this file to
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1689
#"RobustParser.py" (or, in cases of extreme enterprisiness,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1690
#"RobustParserBeanInterface.class") and using the following
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1691
#enterprise-friendly class aliases:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1692
class RobustXMLParser(BeautifulStoneSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1693
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1694
class RobustHTMLParser(BeautifulSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1695
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1696
class RobustWackAssHTMLParser(ICantBelieveItsBeautifulSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1697
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1698
class RobustInsanelyWackAssHTMLParser(MinimalSoup):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1699
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1700
class SimplifyingSOAPParser(BeautifulSOAP):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1701
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1702
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1703
######################################################
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1704
#
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1705
# Bonus library: Unicode, Dammit
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1706
#
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1707
# This class forces XML data into a standard format (usually to UTF-8
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1708
# or Unicode).  It is heavily based on code from Mark Pilgrim's
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1709
# Universal Feed Parser. It does not rewrite the XML or HTML to
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1710
# reflect a new encoding: that happens in BeautifulStoneSoup.handle_pi
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1711
# (XML) and BeautifulSoup.start_meta (HTML).
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1712
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1713
# Autodetects character encodings.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1714
# Download from http://chardet.feedparser.org/
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1715
try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1716
    import chardet
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1717
#    import chardet.constants
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1718
#    chardet.constants._debug = 1
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1719
except ImportError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1720
    chardet = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1721
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1722
# cjkcodecs and iconv_codec make Python know about more character encodings.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1723
# Both are available from http://cjkpython.i18n.org/
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1724
# They're built in if you use Python 2.4.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1725
try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1726
    import cjkcodecs.aliases
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1727
except ImportError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1728
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1729
try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1730
    import iconv_codec
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1731
except ImportError:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1732
    pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1733
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1734
class UnicodeDammit:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1735
    """A class for detecting the encoding of a *ML document and
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1736
    converting it to a Unicode string. If the source encoding is
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1737
    windows-1252, can replace MS smart quotes with their HTML or XML
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1738
    equivalents."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1739
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1740
    # This dictionary maps commonly seen values for "charset" in HTML
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1741
    # meta tags to the corresponding Python codec names. It only covers
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1742
    # values that aren't in Python's aliases and can't be determined
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1743
    # by the heuristics in find_codec.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1744
    CHARSET_ALIASES = { "macintosh" : "mac-roman",
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1745
                        "x-sjis" : "shift-jis" }
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1746
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1747
    def __init__(self, markup, overrideEncodings=[],
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1748
                 smartQuotesTo='xml', isHTML=False):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1749
        self.declaredHTMLEncoding = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1750
        self.markup, documentEncoding, sniffedEncoding = \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1751
                     self._detectEncoding(markup, isHTML)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1752
        self.smartQuotesTo = smartQuotesTo
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1753
        self.triedEncodings = []
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1754
        if markup == '' or isinstance(markup, unicode):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1755
            self.originalEncoding = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1756
            self.unicode = unicode(markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1757
            return
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1758
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1759
        u = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1760
        for proposedEncoding in overrideEncodings:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1761
            u = self._convertFrom(proposedEncoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1762
            if u: break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1763
        if not u:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1764
            for proposedEncoding in (documentEncoding, sniffedEncoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1765
                u = self._convertFrom(proposedEncoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1766
                if u: break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1767
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1768
        # If no luck and we have auto-detection library, try that:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1769
        if not u and chardet and not isinstance(self.markup, unicode):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1770
            u = self._convertFrom(chardet.detect(self.markup)['encoding'])
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1771
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1772
        # As a last resort, try utf-8 and windows-1252:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1773
        if not u:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1774
            for proposed_encoding in ("utf-8", "windows-1252"):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1775
                u = self._convertFrom(proposed_encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1776
                if u: break
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1777
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1778
        self.unicode = u
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1779
        if not u: self.originalEncoding = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1780
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1781
    def _subMSChar(self, match):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1782
        """Changes a MS smart quote character to an XML or HTML
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1783
        entity."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1784
        orig = match.group(1)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1785
        sub = self.MS_CHARS.get(orig)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1786
        if type(sub) == types.TupleType:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1787
            if self.smartQuotesTo == 'xml':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1788
                sub = '&#x'.encode() + sub[1].encode() + ';'.encode()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1789
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1790
                sub = '&'.encode() + sub[0].encode() + ';'.encode()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1791
        else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1792
            sub = sub.encode()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1793
        return sub
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1794
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1795
    def _convertFrom(self, proposed):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1796
        proposed = self.find_codec(proposed)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1797
        if not proposed or proposed in self.triedEncodings:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1798
            return None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1799
        self.triedEncodings.append(proposed)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1800
        markup = self.markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1801
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1802
        # Convert smart quotes to HTML if coming from an encoding
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1803
        # that might have them.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1804
        if self.smartQuotesTo and proposed.lower() in("windows-1252",
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1805
                                                      "iso-8859-1",
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1806
                                                      "iso-8859-2"):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1807
            smart_quotes_re = "([\x80-\x9f])"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1808
            smart_quotes_compiled = re.compile(smart_quotes_re)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1809
            markup = smart_quotes_compiled.sub(self._subMSChar, markup)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1810
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1811
        try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1812
            # print "Trying to convert document to %s" % proposed
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1813
            u = self._toUnicode(markup, proposed)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1814
            self.markup = u
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1815
            self.originalEncoding = proposed
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1816
        except Exception, e:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1817
            # print "That didn't work!"
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1818
            # print e
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1819
            return None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1820
        #print "Correct encoding: %s" % proposed
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1821
        return self.markup
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1822
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1823
    def _toUnicode(self, data, encoding):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1824
        '''Given a string and its encoding, decodes the string into Unicode.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1825
        %encoding is a string recognized by encodings.aliases'''
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1826
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1827
        # strip Byte Order Mark (if present)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1828
        if (len(data) >= 4) and (data[:2] == '\xfe\xff') \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1829
               and (data[2:4] != '\x00\x00'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1830
            encoding = 'utf-16be'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1831
            data = data[2:]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1832
        elif (len(data) >= 4) and (data[:2] == '\xff\xfe') \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1833
                 and (data[2:4] != '\x00\x00'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1834
            encoding = 'utf-16le'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1835
            data = data[2:]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1836
        elif data[:3] == '\xef\xbb\xbf':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1837
            encoding = 'utf-8'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1838
            data = data[3:]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1839
        elif data[:4] == '\x00\x00\xfe\xff':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1840
            encoding = 'utf-32be'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1841
            data = data[4:]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1842
        elif data[:4] == '\xff\xfe\x00\x00':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1843
            encoding = 'utf-32le'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1844
            data = data[4:]
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1845
        newdata = unicode(data, encoding)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1846
        return newdata
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1847
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1848
    def _detectEncoding(self, xml_data, isHTML=False):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1849
        """Given a document, tries to detect its XML encoding."""
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1850
        xml_encoding = sniffed_xml_encoding = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1851
        try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1852
            if xml_data[:4] == '\x4c\x6f\xa7\x94':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1853
                # EBCDIC
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1854
                xml_data = self._ebcdic_to_ascii(xml_data)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1855
            elif xml_data[:4] == '\x00\x3c\x00\x3f':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1856
                # UTF-16BE
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1857
                sniffed_xml_encoding = 'utf-16be'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1858
                xml_data = unicode(xml_data, 'utf-16be').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1859
            elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1860
                     and (xml_data[2:4] != '\x00\x00'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1861
                # UTF-16BE with BOM
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1862
                sniffed_xml_encoding = 'utf-16be'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1863
                xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1864
            elif xml_data[:4] == '\x3c\x00\x3f\x00':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1865
                # UTF-16LE
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1866
                sniffed_xml_encoding = 'utf-16le'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1867
                xml_data = unicode(xml_data, 'utf-16le').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1868
            elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1869
                     (xml_data[2:4] != '\x00\x00'):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1870
                # UTF-16LE with BOM
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1871
                sniffed_xml_encoding = 'utf-16le'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1872
                xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1873
            elif xml_data[:4] == '\x00\x00\x00\x3c':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1874
                # UTF-32BE
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1875
                sniffed_xml_encoding = 'utf-32be'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1876
                xml_data = unicode(xml_data, 'utf-32be').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1877
            elif xml_data[:4] == '\x3c\x00\x00\x00':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1878
                # UTF-32LE
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1879
                sniffed_xml_encoding = 'utf-32le'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1880
                xml_data = unicode(xml_data, 'utf-32le').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1881
            elif xml_data[:4] == '\x00\x00\xfe\xff':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1882
                # UTF-32BE with BOM
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1883
                sniffed_xml_encoding = 'utf-32be'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1884
                xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1885
            elif xml_data[:4] == '\xff\xfe\x00\x00':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1886
                # UTF-32LE with BOM
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1887
                sniffed_xml_encoding = 'utf-32le'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1888
                xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1889
            elif xml_data[:3] == '\xef\xbb\xbf':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1890
                # UTF-8 with BOM
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1891
                sniffed_xml_encoding = 'utf-8'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1892
                xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8')
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1893
            else:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1894
                sniffed_xml_encoding = 'ascii'
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1895
                pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1896
        except:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1897
            xml_encoding_match = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1898
        xml_encoding_re = '^<\?.*encoding=[\'"](.*?)[\'"].*\?>'.encode()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1899
        xml_encoding_match = re.compile(xml_encoding_re).match(xml_data)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1900
        if not xml_encoding_match and isHTML:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1901
            meta_re = '<\s*meta[^>]+charset=([^>]*?)[;\'">]'.encode()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1902
            regexp = re.compile(meta_re, re.I)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1903
            xml_encoding_match = regexp.search(xml_data)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1904
        if xml_encoding_match is not None:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1905
            xml_encoding = xml_encoding_match.groups()[0].decode(
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1906
                'ascii').lower()
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1907
            if isHTML:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1908
                self.declaredHTMLEncoding = xml_encoding
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1909
            if sniffed_xml_encoding and \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1910
               (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1911
                                 'iso-10646-ucs-4', 'ucs-4', 'csucs4',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1912
                                 'utf-16', 'utf-32', 'utf_16', 'utf_32',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1913
                                 'utf16', 'u16')):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1914
                xml_encoding = sniffed_xml_encoding
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1915
        return xml_data, xml_encoding, sniffed_xml_encoding
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1916
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1917
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1918
    def find_codec(self, charset):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1919
        return self._codec(self.CHARSET_ALIASES.get(charset, charset)) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1920
               or (charset and self._codec(charset.replace("-", ""))) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1921
               or (charset and self._codec(charset.replace("-", "_"))) \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1922
               or charset
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1923
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1924
    def _codec(self, charset):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1925
        if not charset: return charset
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1926
        codec = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1927
        try:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1928
            codecs.lookup(charset)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1929
            codec = charset
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1930
        except (LookupError, ValueError):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1931
            pass
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1932
        return codec
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1933
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1934
    EBCDIC_TO_ASCII_MAP = None
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1935
    def _ebcdic_to_ascii(self, s):
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1936
        c = self.__class__
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1937
        if not c.EBCDIC_TO_ASCII_MAP:
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1938
            emap = (0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1939
                    16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1940
                    128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1941
                    144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1942
                    32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1943
                    38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1944
                    45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1945
                    186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1946
                    195,97,98,99,100,101,102,103,104,105,196,197,198,199,200,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1947
                    201,202,106,107,108,109,110,111,112,113,114,203,204,205,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1948
                    206,207,208,209,126,115,116,117,118,119,120,121,122,210,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1949
                    211,212,213,214,215,216,217,218,219,220,221,222,223,224,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1950
                    225,226,227,228,229,230,231,123,65,66,67,68,69,70,71,72,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1951
                    73,232,233,234,235,236,237,125,74,75,76,77,78,79,80,81,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1952
                    82,238,239,240,241,242,243,92,159,83,84,85,86,87,88,89,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1953
                    90,244,245,246,247,248,249,48,49,50,51,52,53,54,55,56,57,
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1954
                    250,251,252,253,254,255)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1955
            import string
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1956
            c.EBCDIC_TO_ASCII_MAP = string.maketrans( \
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1957
            ''.join(map(chr, range(256))), ''.join(map(chr, emap)))
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1958
        return s.translate(c.EBCDIC_TO_ASCII_MAP)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1959
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1960
    MS_CHARS = { '\x80' : ('euro', '20AC'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1961
                 '\x81' : ' ',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1962
                 '\x82' : ('sbquo', '201A'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1963
                 '\x83' : ('fnof', '192'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1964
                 '\x84' : ('bdquo', '201E'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1965
                 '\x85' : ('hellip', '2026'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1966
                 '\x86' : ('dagger', '2020'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1967
                 '\x87' : ('Dagger', '2021'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1968
                 '\x88' : ('circ', '2C6'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1969
                 '\x89' : ('permil', '2030'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1970
                 '\x8A' : ('Scaron', '160'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1971
                 '\x8B' : ('lsaquo', '2039'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1972
                 '\x8C' : ('OElig', '152'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1973
                 '\x8D' : '?',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1974
                 '\x8E' : ('#x17D', '17D'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1975
                 '\x8F' : '?',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1976
                 '\x90' : '?',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1977
                 '\x91' : ('lsquo', '2018'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1978
                 '\x92' : ('rsquo', '2019'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1979
                 '\x93' : ('ldquo', '201C'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1980
                 '\x94' : ('rdquo', '201D'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1981
                 '\x95' : ('bull', '2022'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1982
                 '\x96' : ('ndash', '2013'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1983
                 '\x97' : ('mdash', '2014'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1984
                 '\x98' : ('tilde', '2DC'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1985
                 '\x99' : ('trade', '2122'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1986
                 '\x9a' : ('scaron', '161'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1987
                 '\x9b' : ('rsaquo', '203A'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1988
                 '\x9c' : ('oelig', '153'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1989
                 '\x9d' : '?',
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1990
                 '\x9e' : ('#x17E', '17E'),
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1991
                 '\x9f' : ('Yuml', ''),}
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1992
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1993
#######################################################################
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1994
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1995
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1996
#By default, act as an HTML pretty-printer.
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1997
if __name__ == '__main__':
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1998
    import sys
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  1999
    soup = BeautifulSoup(sys.stdin)
b60a149520e7 Added downloadkit.py - script to download and unpack a PDK
William Roberts <williamr@symbian.org>
parents:
diff changeset
  2000
    print soup.prettify()