"Full-Text" Search? High Performance Queries!
- Search for keywords ("Java")
- Search with wildcards ("Java*" )
- Fuzzy search ("Java~" finds also "Lava")
- Search of terms located close together ("Java Applicationserver~4" both terms within four words)
- Range queries ("500-700")
Besides the Query API the Lucene framework also offers a query parser, that can be used for user interfaces: The parser eventually uses the abovementioned query API.
Lucenes main application "the usual suspects" are problems like: indexing web-sites, indexing PDF, Office documents on a file server, indexing Wikis (Wikipedia) and so on. Yet, regarding these powerful query options, Lucene is recently used in some projects replacing traditional databases. This can be a good idea, when large amounts of data have to be accessed efficiently, access is mostly read only (few changes/writes) and transactions are not important. Objects could be serialised, e.g., in XML and stored on disk and added to the Lucene index. This can be an easier procedure then using databases. There are drawbacks, of course, like the fact, that data is not as highly structured as in relational databases.
The success of the Lucene project, that originated as Java project in the Apache Software pool, is meanwhile followed by ports to other languages like: Perl, Python, C++, Ruby and .net. However, it should be noted, that not all port yet show the same quality as the Java version.
One additional tip for work with Lucene is Luke:
Luke is a very handy tool, that allows to inspect Lucene indices. As they are stored in binary form, they are hardly accessible with other editors. And in developing Lucene applications, Luke is very helpful in checking, whether the index really looks as expected.
Remark: Sorry for my previous mistake: First I forgot to mention one of the two authors, and then I wrote the name of the other one wrong...