symbian-qemu-0.9.1-12/python-2.6.1/Doc/library/robotparser.rst
changeset 1 2fb8b9db1c86
equal deleted inserted replaced
0:ffa851df0825 1:2fb8b9db1c86
       
     1 
       
     2 :mod:`robotparser` ---  Parser for robots.txt
       
     3 =============================================
       
     4 
       
     5 .. module:: robotparser
       
     6    :synopsis: Loads a robots.txt file and answers questions about
       
     7               fetchability of other URLs.
       
     8 .. sectionauthor:: Skip Montanaro <skip@pobox.com>
       
     9 
       
    10 
       
    11 .. index::
       
    12    single: WWW
       
    13    single: World Wide Web
       
    14    single: URL
       
    15    single: robots.txt
       
    16    
       
    17 .. note::
       
    18    The :mod:`robotparser` module has been renamed :mod:`urllib.robotparser` in
       
    19    Python 3.0.
       
    20    The :term:`2to3` tool will automatically adapt imports when converting
       
    21    your sources to 3.0.
       
    22 
       
    23 This module provides a single class, :class:`RobotFileParser`, which answers
       
    24 questions about whether or not a particular user agent can fetch a URL on the
       
    25 Web site that published the :file:`robots.txt` file.  For more details on the
       
    26 structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
       
    27 
       
    28 
       
    29 .. class:: RobotFileParser()
       
    30 
       
    31    This class provides a set of methods to read, parse and answer questions
       
    32    about a single :file:`robots.txt` file.
       
    33 
       
    34 
       
    35    .. method:: set_url(url)
       
    36 
       
    37       Sets the URL referring to a :file:`robots.txt` file.
       
    38 
       
    39 
       
    40    .. method:: read()
       
    41 
       
    42       Reads the :file:`robots.txt` URL and feeds it to the parser.
       
    43 
       
    44 
       
    45    .. method:: parse(lines)
       
    46 
       
    47       Parses the lines argument.
       
    48 
       
    49 
       
    50    .. method:: can_fetch(useragent, url)
       
    51 
       
    52       Returns ``True`` if the *useragent* is allowed to fetch the *url*
       
    53       according to the rules contained in the parsed :file:`robots.txt`
       
    54       file.
       
    55 
       
    56 
       
    57    .. method:: mtime()
       
    58 
       
    59       Returns the time the ``robots.txt`` file was last fetched.  This is
       
    60       useful for long-running web spiders that need to check for new
       
    61       ``robots.txt`` files periodically.
       
    62 
       
    63 
       
    64    .. method:: modified()
       
    65 
       
    66       Sets the time the ``robots.txt`` file was last fetched to the current
       
    67       time.
       
    68 
       
    69 The following example demonstrates basic use of the RobotFileParser class. ::
       
    70 
       
    71    >>> import robotparser
       
    72    >>> rp = robotparser.RobotFileParser()
       
    73    >>> rp.set_url("http://www.musi-cal.com/robots.txt")
       
    74    >>> rp.read()
       
    75    >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
       
    76    False
       
    77    >>> rp.can_fetch("*", "http://www.musi-cal.com/")
       
    78    True
       
    79