author | eckhart.koppen@nokia.com |
Wed, 31 Mar 2010 11:06:36 +0300 | |
changeset 7 | f7bc934e204c |
permissions | -rw-r--r-- |
7
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
1 |
CLucene README |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
2 |
============== |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
3 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
4 |
------------------------------------------------------ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
5 |
CLucene is a C++ port of Lucene. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
6 |
It is a high-performance, full-featured text search |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
7 |
engine written in C++. CLucene is faster than lucene |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
8 |
as it is written in C++. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
9 |
------------------------------------------------------ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
10 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
11 |
CLucene has contributions from many, see AUTHORS |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
12 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
13 |
CLucene is distributed under the GNU Lesser General Public License (LGPL) |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
14 |
*or* |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
15 |
the Apache License, Version 2.0 |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
16 |
See the LGPL.license and APACHE.license for the respective license information. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
17 |
Read COPYING for more about the license. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
18 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
19 |
Installation |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
20 |
------------ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
21 |
* For Linux, MacOSX, cygwin and MinGW build information, read INSTALL. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
22 |
* Boost.Jam files are provided in the root directory and subdirectories. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
23 |
* Microsoft Visual Studio (6&7) are provided in the win32 folder. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
24 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
25 |
Mailing List |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
26 |
------------ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
27 |
Questions and discussion should be directed to the CLucene mailing list |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
28 |
at clucene-developers@lists.sourceforge.net |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
29 |
Find subscription instructions at |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
30 |
http://lists.sourceforge.net/lists/listinfo/clucene-developers |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
31 |
Suggestions and bug reports can be made on our bug tracking database |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
32 |
(http://sourceforge.net/tracker/?group_id=80013&atid=558446) |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
33 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
34 |
The latest version |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
35 |
------------------ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
36 |
Details of the latest version can be found on the CLucene sourceforge project |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
37 |
web site: http://www.sourceforge.net/projects/clucene |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
38 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
39 |
Documentation |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
40 |
------------- |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
41 |
Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
42 |
You can also build your own documentation by running doxygen from the root directory |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
43 |
of clucene. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
44 |
CLucene is a very close port of Java Lucene, so you can also try looking at the |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
45 |
Java Docs on http://lucene.apache.org/java/ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
46 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
47 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
48 |
Performance |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
49 |
----------- |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
50 |
Very little benchmarking has been done on clucene. Andi Vajda posted some |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
51 |
limited statistics on the clucene list a while ago with the following results. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
52 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
53 |
There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
54 |
6108kb of HTML text. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
55 |
org.apache.lucene.demo.IndexFiles with java and gcj: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
56 |
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
57 |
. running with java 1.4.1_01-99 : 20379 ms |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
58 |
. running with gcj 3.3.2 -O2 : 17842 ms |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
59 |
. running clucene 0.8.9's demo : 9930 ms |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
60 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
61 |
I recently did some more tests and came up with these rough tests: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
62 |
663mb (797 files) of Guttenberg texts |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
63 |
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
64 |
• Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
65 |
• Clucene: 232141. peak mem usage ~60, avg ~4mb ram |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
66 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
67 |
Searching indexing using 10,000 single word queries |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
68 |
• Jlucene: ~60078ms and used ~13mb ram |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
69 |
• Clucene: ~48359ms and used ~4.2mb ram |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
70 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
71 |
Platform notes |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
72 |
-------------- |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
73 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
74 |
'Too many open files' |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
75 |
Some platforms don't provide enough file handles to run CLucene properly. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
76 |
To solve this, increase the open file limit: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
77 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
78 |
On Solaris: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
79 |
ulimit -n 1024 |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
80 |
set rlim_fd_cur=1024 |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
81 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
82 |
Acknowledgments |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
83 |
---------------- |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
84 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
85 |
The Apache Lucene project is the basis for this software, so the biggest |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
86 |
acknoledgment goes to that project. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
87 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
88 |
We wish to acknowledge the following copyrighted works that |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
89 |
make up portions of the CLucene software: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
90 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
91 |
CLucene relies heavily on the use of autoconf and libtool to provide |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
92 |
a build environment. |