We are using Lucene.Net in Communifire, and we noticed that with a lot of data, the application does not show any search results if someone clicks the search button to find a keyword.The lucene indexing starts in multiple threads as soon as the web app starts the first time. This thread building runs asynchronously, so that home page doesn't wait to load till the search indexes are created.Till now, we never called any Lucene search on our home page so everything went fine. But now, the problem started because we were hitting Lucene search process on home page itself through a generic control which used Lucene to get specific filtered data.
So the sequence was: 1. The web app starts (first hit to thesite). 2. ThreadOne starts to load the default page, another thread ThreadTwo starts to index the data for Lucene (into separate threads for each datasource). 3. While loading home page, threadOne encounters the generic control which calls the Lucene search method. Unfortunately our friend ThreadTwo is still building the Lucene indices. So poor ThreadOne returns no records because there are no indices built yet. 4. Thread One finishes and returns and shows the home page with generic controls not showing any data from Lucene. Meanwhile ThreadTwo is still building the indices, and getting a bit tired by now :)
The index building takes usually 1-2 min (if on the same server) for around 100K records. Once ThreadTwo finishes, and you refresh the page again, now all the generic controls will see the data. To fix this from a usability point of view (because users generally do not like to see empty controls), we had two options here:
Option 1. Do no use parallel/asynchronous) tasks to load these lucene bulding threads. Keep them sequential so that the homepage loads ONLY when all the threads have been built. This will make sure that the user sees the data but he will have to wait for a few minutes to load the homepage. So the first hit to the site will get very slow. But once the indices has been built, the next hits will be quite fast.
Option 2: Make some settings in a config file which will turn on/off parallel (async) processing of certain datasources by Lucene into indices. For example, if we show only certain datasources inside the generic controls on the home page, we will turn OFF parallel loading of such individual data sources whereas let Lucene process other sources asynchronously.
We went with option 2 as it offered more flexiblity considering different deployment options.