The creation of an Index involves two different processes. The first one populates Lucene Documents with Fields. It is the responsibility of the search engine application to convert original data(PDF,Html,Txt,...) into Lucene Document(field,value), using an appropriate document parser(exple. : PdfBox,JTidy,Sax) .
Once Lucene Document are created the second process is taken over by the IndexWriter, this one is used to create and maintain the Index: IndexWriter's addDocument(LuceneDocument) method gather Lucene Document Fields Value into the Index.
This is one of the Syntax to use to create an IndexWriter:
IndexWriter W = new IndexWriter(FSDirectory.open(indexdir),
FSDirectory is an Implementation of Directory, that store the index in a new or an existing directory in the computer. the Index files would be stored in indexdir
The Analyzer is a startegy used by the IndexWriter to analyze the Lucene Documents fields before they are stored. In this case we choose the simplest one, the StandardAnalyzer for the version 3.0 of Lucene. We can also choose not to limit the length of a field, so all the terms in a field should be considered.
To prevent concurrency, a Lock is used to avoid other IndexWriters to open the same Index directory.
The next step is to delve into each component of this process. Let's start by the Lucene Document.
A Lucene Document is a set of Fields. A Field comprises a name and one or more values. The name is usually a word (String type) describing the field like content, path, name, date of creation are examples of field's names. The value is the text of that field.
Lucene Document is used in these three cases :