Apache Lucene

Architecture and Implementation

The Search Process
In a Lucene bases search application, the major components involved are the Query, the QueryParser, IndexSearcher , Filter and Analyzer.The FMC Block diagram below shows how those components works together when searching for a given user query.

 

 Unlike the Document parser which is supplied by the applicaiton developper, the QueryParser is a Class in the Lucene core, generated by the Java compiler compiler (javacc). When a user query pass through the query parsing process it becomes a Lucene query.

The Lucene Query is a set of one or more terms, symbols and operators. A term is a word like 'spider' used in the Termquery , which is the basis query form. An operator could be a boolean operator like 'AND', 'OR' or 'NOT'  and a symbol is a character like * or ? used in wildcardQuery. Lucene supplies the user with more than ten different types of queries: TermQuery,  MultitermQuery, FuzzyQuery, TermRangeQuery, NumericRangeQuery,SpanQuery, BooleanQuery,  WildcardQuery, PhraseQuery, PrefixQuery,MultiphraseQuery.

The IndexSearcher stands at the begining and at the end of the retrieval process. It recieves a valid Lucene Query from the QueryParser , then  it retrieved terms(also called postings) in the index that matches the query afterwards, it returns the top hits of that query(TopDocs). For each matching term in a Lucene Document , the TopDocs  strore the Documents' numbers in his list .The next Part delve inside the search mechanism. 

Other significant components are Filter and Analyzer. Filter are used to select which documents should be allowed in search results. Lucene library supplies 6 filters that can be used to orestrict the search results. These are: CatchingWrapperFilter, the FieldCacheRangeFilter, the FieldCacheTermFilter, the MultitermQueryWrapperFilter, the QueryWrapperFilter and the SpanFilter. Each of them extends the basis Filter class, and each except the SpanFilte, overwrite the getDocIdSet() method to match their own filtering purposes.

 

How The Index is used for Search ?