src/3rdparty/clucene/README
author Dremov Kirill (Nokia-D-MSW/Tampere) <kirill.dremov@nokia.com>
Fri, 19 Feb 2010 23:40:16 +0200
branchRCL_3
changeset 4 3b1da2848fc7
parent 0 1918ee327afb
permissions -rw-r--r--
Revision: 201003 Kit: 201007
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     1
CLucene README
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     2
==============
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     3
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     4
------------------------------------------------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     5
CLucene is a C++ port of Lucene.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     6
It is a high-performance, full-featured text search 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     7
engine written in C++. CLucene is faster than lucene
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     8
as it is written in C++.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     9
------------------------------------------------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    10
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    11
CLucene has contributions from many, see AUTHORS
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    12
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    13
CLucene is distributed under the GNU Lesser General Public License (LGPL) 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    14
	*or*
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    15
the Apache License, Version 2.0
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    16
See the LGPL.license and APACHE.license for the respective license information.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    17
Read COPYING for more about the license.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    18
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    19
Installation
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    20
------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    21
* For Linux, MacOSX, cygwin and MinGW build information, read INSTALL.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    22
* Boost.Jam files are provided in the root directory and subdirectories.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    23
* Microsoft Visual Studio (6&7) are provided in the win32 folder.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    24
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    25
Mailing List
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    26
------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    27
Questions and discussion should be directed to the CLucene mailing list
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    28
  at clucene-developers@lists.sourceforge.net  
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    29
Find subscription instructions at 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    30
  http://lists.sourceforge.net/lists/listinfo/clucene-developers
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    31
Suggestions and bug reports can be made on our bug tracking database
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    32
  (http://sourceforge.net/tracker/?group_id=80013&atid=558446)
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    33
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    34
The latest version
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    35
------------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    36
Details of the latest version can be found on the CLucene sourceforge project
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    37
web site: http://www.sourceforge.net/projects/clucene
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    38
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    39
Documentation
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    40
-------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    41
Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    42
You can also build your own documentation by running doxygen from the root directory
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    43
of clucene.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    44
CLucene is a very close port of Java Lucene, so you can also try looking at the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    45
Java Docs on http://lucene.apache.org/java/
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    46
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    47
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    48
Performance
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    49
-----------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    50
Very little benchmarking has been done on clucene. Andi Vajda posted some 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    51
limited statistics on the clucene list a while ago with the following results.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    52
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    53
There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    54
6108kb of HTML text. 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    55
org.apache.lucene.demo.IndexFiles with java and gcj: 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    56
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    57
    . running with java 1.4.1_01-99 : 20379 ms
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    58
    . running with gcj 3.3.2 -O2    : 17842 ms
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    59
    . running clucene 0.8.9's demo  :  9930 ms 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    60
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    61
I recently did some more tests and came up with these rough tests:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    62
663mb (797 files) of Guttenberg texts 
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    63
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    64
• Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    65
• Clucene: 232141. peak mem usage ~60, avg ~4mb ram
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    66
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    67
Searching indexing using 10,000 single word queries
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    68
• Jlucene: ~60078ms and used ~13mb ram
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    69
• Clucene: ~48359ms and used ~4.2mb ram
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    70
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    71
Platform notes
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    72
--------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    73
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    74
'Too many open files'
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    75
Some platforms don't provide enough file handles to run CLucene properly.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    76
To solve this, increase the open file limit:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    77
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    78
On Solaris:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    79
ulimit -n 1024
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    80
set rlim_fd_cur=1024
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    81
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    82
Acknowledgments
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    83
----------------
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    84
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    85
The Apache Lucene project is the basis for this software, so the biggest
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    86
acknoledgment goes to that project.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    87
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    88
We wish to acknowledge the following copyrighted works that
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    89
make up portions of the CLucene software:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    90
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    91
CLucene relies heavily on the use of autoconf and libtool to provide
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    92
a build environment.