|
1 CLucene README |
|
2 ============== |
|
3 |
|
4 ------------------------------------------------------ |
|
5 CLucene is a C++ port of Lucene. |
|
6 It is a high-performance, full-featured text search |
|
7 engine written in C++. CLucene is faster than lucene |
|
8 as it is written in C++. |
|
9 ------------------------------------------------------ |
|
10 |
|
11 CLucene has contributions from many, see AUTHORS |
|
12 |
|
13 CLucene is distributed under the GNU Lesser General Public License (LGPL) |
|
14 *or* |
|
15 the Apache License, Version 2.0 |
|
16 See the LGPL.license and APACHE.license for the respective license information. |
|
17 Read COPYING for more about the license. |
|
18 |
|
19 Installation |
|
20 ------------ |
|
21 * For Linux, MacOSX, cygwin and MinGW build information, read INSTALL. |
|
22 * Boost.Jam files are provided in the root directory and subdirectories. |
|
23 * Microsoft Visual Studio (6&7) are provided in the win32 folder. |
|
24 |
|
25 Mailing List |
|
26 ------------ |
|
27 Questions and discussion should be directed to the CLucene mailing list |
|
28 at clucene-developers@lists.sourceforge.net |
|
29 Find subscription instructions at |
|
30 http://lists.sourceforge.net/lists/listinfo/clucene-developers |
|
31 Suggestions and bug reports can be made on our bug tracking database |
|
32 (http://sourceforge.net/tracker/?group_id=80013&atid=558446) |
|
33 |
|
34 The latest version |
|
35 ------------------ |
|
36 Details of the latest version can be found on the CLucene sourceforge project |
|
37 web site: http://www.sourceforge.net/projects/clucene |
|
38 |
|
39 Documentation |
|
40 ------------- |
|
41 Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/ |
|
42 You can also build your own documentation by running doxygen from the root directory |
|
43 of clucene. |
|
44 CLucene is a very close port of Java Lucene, so you can also try looking at the |
|
45 Java Docs on http://lucene.apache.org/java/ |
|
46 |
|
47 |
|
48 Performance |
|
49 ----------- |
|
50 Very little benchmarking has been done on clucene. Andi Vajda posted some |
|
51 limited statistics on the clucene list a while ago with the following results. |
|
52 |
|
53 There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about |
|
54 6108kb of HTML text. |
|
55 org.apache.lucene.demo.IndexFiles with java and gcj: |
|
56 on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb: |
|
57 . running with java 1.4.1_01-99 : 20379 ms |
|
58 . running with gcj 3.3.2 -O2 : 17842 ms |
|
59 . running clucene 0.8.9's demo : 9930 ms |
|
60 |
|
61 I recently did some more tests and came up with these rough tests: |
|
62 663mb (797 files) of Guttenberg texts |
|
63 on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields |
|
64 • Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram |
|
65 • Clucene: 232141. peak mem usage ~60, avg ~4mb ram |
|
66 |
|
67 Searching indexing using 10,000 single word queries |
|
68 • Jlucene: ~60078ms and used ~13mb ram |
|
69 • Clucene: ~48359ms and used ~4.2mb ram |
|
70 |
|
71 Platform notes |
|
72 -------------- |
|
73 |
|
74 'Too many open files' |
|
75 Some platforms don't provide enough file handles to run CLucene properly. |
|
76 To solve this, increase the open file limit: |
|
77 |
|
78 On Solaris: |
|
79 ulimit -n 1024 |
|
80 set rlim_fd_cur=1024 |
|
81 |
|
82 Acknowledgments |
|
83 ---------------- |
|
84 |
|
85 The Apache Lucene project is the basis for this software, so the biggest |
|
86 acknoledgment goes to that project. |
|
87 |
|
88 We wish to acknowledge the following copyrighted works that |
|
89 make up portions of the CLucene software: |
|
90 |
|
91 CLucene relies heavily on the use of autoconf and libtool to provide |
|
92 a build environment. |