The purpose of this web site is to share my thesis results which subject was to strudy the architecture and implementation of Apache Lucene.
At the beginning of this project, the D.k.d Internet services, a Typo3 web development company in Frankfurt/ germany, had the idea to create a better searchengine extension for Typo3. Thus, they found Solr which is a Lucene based web search application. Together with my supervisor Dr. renz, we had the idea of studying the internal architecture of Apache Lucene in order to enable a better usage of its components.
I first implemented a small Search Engine based on Lucene, named SeboL, to help going in-depth into the Lucene components . Delving into the Lucene indexing was a quite difficult task because of the complexity of the library. On the long run it was possible to illustrate the internal architecture of the following Lucene components: Field, Lucene Document, Lucene Analysis, The Index writing mechanism, the decorator pattern used by the Analyzer, the Lucene index file formats and the structure of a Lucene Query object.
On this website I'll point out the important schema of those components and their interaction.
The SeboL search engine is also available for download in the Download section.
Lucene was developed 1998 by Doug cuting and published on Sourcefourge as Open source Project. Lucene is not an abreviation but the second name od Doug cutting's wife. Since 2011 till date, Lucene is part of the Apache foundation and is called Apache Lucene.
According to the founder, 'Apache Lucene is a software library for full-text search. It's not an application but rather a technology that can be incorporated in applications' Doug Cutting.
Here is a short video of Doug cutting taliking about the future of search: Lucene nutch and hadoop.
The compositional structure of an application based on Lucene may have the following components: