util/src/3rdparty/clucene/README
author eckhart.koppen@nokia.com
Wed, 31 Mar 2010 11:06:36 +0300
changeset 7 f7bc934e204c
permissions -rw-r--r--
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
7
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     1
CLucene README
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     2
==============
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     3
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     4
------------------------------------------------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     5
CLucene is a C++ port of Lucene.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     6
It is a high-performance, full-featured text search 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     7
engine written in C++. CLucene is faster than lucene
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     8
as it is written in C++.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
     9
------------------------------------------------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    10
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    11
CLucene has contributions from many, see AUTHORS
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    12
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    13
CLucene is distributed under the GNU Lesser General Public License (LGPL) 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    14
	*or*
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    15
the Apache License, Version 2.0
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    16
See the LGPL.license and APACHE.license for the respective license information.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    17
Read COPYING for more about the license.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    18
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    19
Installation
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    20
------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    21
* For Linux, MacOSX, cygwin and MinGW build information, read INSTALL.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    22
* Boost.Jam files are provided in the root directory and subdirectories.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    23
* Microsoft Visual Studio (6&7) are provided in the win32 folder.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    24
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    25
Mailing List
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    26
------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    27
Questions and discussion should be directed to the CLucene mailing list
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    28
  at clucene-developers@lists.sourceforge.net  
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    29
Find subscription instructions at 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    30
  http://lists.sourceforge.net/lists/listinfo/clucene-developers
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    31
Suggestions and bug reports can be made on our bug tracking database
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    32
  (http://sourceforge.net/tracker/?group_id=80013&atid=558446)
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    33
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    34
The latest version
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    35
------------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    36
Details of the latest version can be found on the CLucene sourceforge project
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    37
web site: http://www.sourceforge.net/projects/clucene
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    38
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    39
Documentation
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    40
-------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    41
Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    42
You can also build your own documentation by running doxygen from the root directory
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    43
of clucene.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    44
CLucene is a very close port of Java Lucene, so you can also try looking at the
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    45
Java Docs on http://lucene.apache.org/java/
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    46
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    47
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    48
Performance
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    49
-----------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    50
Very little benchmarking has been done on clucene. Andi Vajda posted some 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    51
limited statistics on the clucene list a while ago with the following results.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    52
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    53
There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    54
6108kb of HTML text. 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    55
org.apache.lucene.demo.IndexFiles with java and gcj: 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    56
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    57
    . running with java 1.4.1_01-99 : 20379 ms
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    58
    . running with gcj 3.3.2 -O2    : 17842 ms
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    59
    . running clucene 0.8.9's demo  :  9930 ms 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    60
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    61
I recently did some more tests and came up with these rough tests:
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    62
663mb (797 files) of Guttenberg texts 
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    63
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    64
• Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    65
• Clucene: 232141. peak mem usage ~60, avg ~4mb ram
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    66
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    67
Searching indexing using 10,000 single word queries
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    68
• Jlucene: ~60078ms and used ~13mb ram
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    69
• Clucene: ~48359ms and used ~4.2mb ram
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    70
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    71
Platform notes
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    72
--------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    73
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    74
'Too many open files'
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    75
Some platforms don't provide enough file handles to run CLucene properly.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    76
To solve this, increase the open file limit:
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    77
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    78
On Solaris:
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    79
ulimit -n 1024
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    80
set rlim_fd_cur=1024
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    81
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    82
Acknowledgments
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    83
----------------
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    84
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    85
The Apache Lucene project is the basis for this software, so the biggest
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    86
acknoledgment goes to that project.
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    87
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    88
We wish to acknowledge the following copyrighted works that
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    89
make up portions of the CLucene software:
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    90
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    91
CLucene relies heavily on the use of autoconf and libtool to provide
f7bc934e204c 5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff changeset
    92
a build environment.