0
|
1 |
CLucene README
|
|
2 |
==============
|
|
3 |
|
|
4 |
------------------------------------------------------
|
|
5 |
CLucene is a C++ port of Lucene.
|
|
6 |
It is a high-performance, full-featured text search
|
|
7 |
engine written in C++. CLucene is faster than lucene
|
|
8 |
as it is written in C++.
|
|
9 |
------------------------------------------------------
|
|
10 |
|
|
11 |
CLucene has contributions from many, see AUTHORS
|
|
12 |
|
|
13 |
CLucene is distributed under the GNU Lesser General Public License (LGPL)
|
|
14 |
*or*
|
|
15 |
the Apache License, Version 2.0
|
|
16 |
See the LGPL.license and APACHE.license for the respective license information.
|
|
17 |
Read COPYING for more about the license.
|
|
18 |
|
|
19 |
Installation
|
|
20 |
------------
|
|
21 |
* For Linux, MacOSX, cygwin and MinGW build information, read INSTALL.
|
|
22 |
* Boost.Jam files are provided in the root directory and subdirectories.
|
|
23 |
* Microsoft Visual Studio (6&7) are provided in the win32 folder.
|
|
24 |
|
|
25 |
Mailing List
|
|
26 |
------------
|
|
27 |
Questions and discussion should be directed to the CLucene mailing list
|
|
28 |
at clucene-developers@lists.sourceforge.net
|
|
29 |
Find subscription instructions at
|
|
30 |
http://lists.sourceforge.net/lists/listinfo/clucene-developers
|
|
31 |
Suggestions and bug reports can be made on our bug tracking database
|
|
32 |
(http://sourceforge.net/tracker/?group_id=80013&atid=558446)
|
|
33 |
|
|
34 |
The latest version
|
|
35 |
------------------
|
|
36 |
Details of the latest version can be found on the CLucene sourceforge project
|
|
37 |
web site: http://www.sourceforge.net/projects/clucene
|
|
38 |
|
|
39 |
Documentation
|
|
40 |
-------------
|
|
41 |
Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/
|
|
42 |
You can also build your own documentation by running doxygen from the root directory
|
|
43 |
of clucene.
|
|
44 |
CLucene is a very close port of Java Lucene, so you can also try looking at the
|
|
45 |
Java Docs on http://lucene.apache.org/java/
|
|
46 |
|
|
47 |
|
|
48 |
Performance
|
|
49 |
-----------
|
|
50 |
Very little benchmarking has been done on clucene. Andi Vajda posted some
|
|
51 |
limited statistics on the clucene list a while ago with the following results.
|
|
52 |
|
|
53 |
There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
|
|
54 |
6108kb of HTML text.
|
|
55 |
org.apache.lucene.demo.IndexFiles with java and gcj:
|
|
56 |
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
|
|
57 |
. running with java 1.4.1_01-99 : 20379 ms
|
|
58 |
. running with gcj 3.3.2 -O2 : 17842 ms
|
|
59 |
. running clucene 0.8.9's demo : 9930 ms
|
|
60 |
|
|
61 |
I recently did some more tests and came up with these rough tests:
|
|
62 |
663mb (797 files) of Guttenberg texts
|
|
63 |
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
|
|
64 |
• Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
|
|
65 |
• Clucene: 232141. peak mem usage ~60, avg ~4mb ram
|
|
66 |
|
|
67 |
Searching indexing using 10,000 single word queries
|
|
68 |
• Jlucene: ~60078ms and used ~13mb ram
|
|
69 |
• Clucene: ~48359ms and used ~4.2mb ram
|
|
70 |
|
|
71 |
Platform notes
|
|
72 |
--------------
|
|
73 |
|
|
74 |
'Too many open files'
|
|
75 |
Some platforms don't provide enough file handles to run CLucene properly.
|
|
76 |
To solve this, increase the open file limit:
|
|
77 |
|
|
78 |
On Solaris:
|
|
79 |
ulimit -n 1024
|
|
80 |
set rlim_fd_cur=1024
|
|
81 |
|
|
82 |
Acknowledgments
|
|
83 |
----------------
|
|
84 |
|
|
85 |
The Apache Lucene project is the basis for this software, so the biggest
|
|
86 |
acknoledgment goes to that project.
|
|
87 |
|
|
88 |
We wish to acknowledge the following copyrighted works that
|
|
89 |
make up portions of the CLucene software:
|
|
90 |
|
|
91 |
CLucene relies heavily on the use of autoconf and libtool to provide
|
|
92 |
a build environment.
|