A Guide To Filtered Search In Umbraco

 

 

A step by step guide to creating a filtered search on site that targets specific content types including media files (pdf, docx, doc etc.) in the /media directory OR searches across all content types using the Umbraco Examine wrapper for Lucene.net indexing/search engine.

 

For Info: Approach tested on Umbraco version 7.4.3 assembly: 1.0.5948.18141

 

Overview - we will create an index, an indexer and a searcher for each specific search filter and then a multi index searcher to search across all our indexes. Finally we will use the searchers in a filtered search page.

 

Step 1. Get the zip file from the link below - DO NOT RUN THE INSTALLER - we want to keep everything transparent, so we'll do the install ourselves.

 

https://our.umbraco.org/FileDownload?id=4210

 

(Credit must go to Ismail here - https://our.umbraco.org/member/1203 - this article is a modification and expanded version of his own article and uses his contributed media indexer dll)

 

Once you have the zip file - extract it and copy the following dlls ONLY to your /bin folder

 

CogUmbracoExamineMediaIndexer.dll
IKVM.OpenJDK.Core.dll
IKVM.OpenJDK.Jdbc.dll
IKVM.OpenJDK.Media.dll
IKVM.OpenJDK.Security.dll
IKVM.OpenJDK.SwingAWT.dll
IKVM.OpenJDK.Text.dll
IKVM.OpenJDK.Util.dll
IKVM.OpenJDK.XML.API.dll
IKVM.OpenJDK.XML.Parse.dll
IKVM.OpenJDK.XML.Transform.dll
IKVM.OpenJDK.XML.XPath.dll
IKVM.Runtime.dll

 

Step 2. Get the tika dll from the link below and  copy to your /bin folder

 

https://www.dropbox.com/s/0rk556kjgd8swvy/tika-app-1.2.dll

 

Step 3. Create your indexes.

 

Open /Config/ExamineIndex.config

 

You will see the following :

 

<?xml version="1.0"?>
<!--
Umbraco examine is an extensible indexer and search engine.
This configuration file can be extended to create your own index sets.
Index/Search providers can be defined in the UmbracoSettings.config
More information and documentation can be found on CodePlex: http://umbracoexamine.codeplex.com --> <ExamineLuceneIndexSets> <!-- The internal index set used by Umbraco back-office - DO NOT REMOVE --> <IndexSet SetName="InternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/{machinename}/Internal/"/>
<!-- The internal index set used by Umbraco back-office for indexing members - DO NOT REMOVE --> <IndexSet SetName="InternalMemberIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/{machinename}/InternalMember/"> <IndexAttributeFields> <add Name="id"/> <add Name="nodeName"/> <add Name="updateDate"/> <add Name="writerName"/> <add Name="loginName"/> <add Name="email"/> <add Name="nodeTypeAlias"/> </IndexAttributeFields> </IndexSet>
<!-- Default Indexset for external searches, this indexes all fields on all types of nodes--> <IndexSet SetName="ExternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/{machinename}/External/"/> <IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/PDFs"/>
</ExamineLuceneIndexSets>

We are going to create our own indexes in here - we will leave the existing indexes that come with the install alone - we will create an index that targets a specific document type and an index that targets media files (.pdf, .doc and .docx) in the /media folder.

 

Starting with an index for a specific document type.

 

Identify the Document Type you wish to filter your searches on and make a note of the additional custom property aliases you have added to the Document Type that you wish to search on.

 

For example - if I have created a Document Type called alpha with custom properties summary and description and I want to index these properties then make a note of the underlying property alias that is used for those attributes (look at your document type to double check the names used). Let's keep it simple and say the property names match!

 

In this case summary and description.

 

Under the last index set that comes with the install in the file /Config/ExamineIndex,config we will create a new index called AlphaIndexSet thus:

 

<!-- Indexset for Alpha Documents-->
<IndexSet SetName="AlphaIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Alpha">
<!--the standard properties-->
<IndexAttributeFields>
<add Name="id" />
<add Name="nodeName" />
<add Name="updateDate" />
<add Name="writerName" />
<add Name="path" />
<add Name="nodeTypeAlias" />
<add Name="parentID" />
</IndexAttributeFields>
<!--the document type properties we created - use exact alias-->
<IndexUserFields>
<add Name="summary"/>
<add Name="description"/>
</IndexUserFields>
<!--confine to nodes of document type Alpha-->
<IncludeNodeTypes>
<add Name="alpha"/>
</IncludeNodeTypes>
</IndexSet>

The index above defines the standard properties to index, the custom properties to index and then confines the index to documents of type alpha. All good.

 

Now we will create an index for media files of type .pdf, .doc and .docx. For a full list of files that you can index see here : http://tika.apache.org/1.2/formats.html#Audio_formats

 

Below the above index set for Alpha documents create a new index set.

 

N.B. I have created a mandatory custom property on Media Files called friendlyFileName - this is so that we can quickly access a good title to use as a link in the search results later -  additionally you can index this field too!

 

<!--using the CogUmbracoExamineMediaIndexer-->
<IndexSet SetName="MediaIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/MediaIndexSet">
<IndexAttributeFields>
<add Name="id" />
<add Name="nodeName" />
<add Name="updateDate" />
<add Name="writerName" />
<add Name="path" />
<add Name="nodeTypeAlias" />
<add Name="parentID" />
</IndexAttributeFields>
<!--the doc type properties we created - use the property alias-->
<IndexUserFields>
<add Name="friendlyFileName"/>
</IndexUserFields>
</IndexSet>

So - we now have two indexes! The next thing to do is establish the Indexers and the Searchers.

 

Step 4. Create your indexers - the indexers index the indexes you created above.

 

Open /Config/ExamineSettings.config

 

You will see the following :

 

N.B. Indexers and Searchers are created in the same file ExamineSettings.config - for now ignore the ExamineSearchProviders area and concentrate on the ExamineIndexProviders area.

 

<?xml version="1.0"?>
<!--
Umbraco examine is an extensible indexer and search engine.
This configuration file can be extended to add your own search/index providers.
Index sets can be defined in the ExamineIndex.config if you're using the standard provider model.

More information and documentation can be found on CodePlex: http://umbracoexamine.codeplex.com
-->
<Examine>
<ExamineIndexProviders>
<providers>
<add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportUnpublished="true" supportProtected="true" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
<add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine" supportUnpublished="true" supportProtected="true" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
<!-- default external indexer, which excludes protected and unpublished pages-->
<add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>
<add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF" extensions=".pdf" umbracoFileProperty="umbracoFile"/>
</providers>
</ExamineIndexProviders>

<ExamineSearchProviders defaultProvider="ExternalSearcher">
<providers>
<add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
<add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"/>
<add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true"/>
<add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"/>
</providers>
</ExamineSearchProviders>

</Examine>

Add two new Examine index providers immediately after the PDFIndexer that comes with install.

 

N.B. The naming convention MUST follow  YourIndexNameIndexer in this case then AlphaIndexer and MediaIndexer

 

<!--Indexer for AlphaIndex-->
<add name="AlphaIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportProtected="true" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/> 
<!--Indexer for MediaIndex--> 
<add name="MediaIndexer" type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer" extensions=".pdf,.docx,.doc" umbracoFileProperty="umbracoFile" />

 

N.B. the AlphaIndexer has an attribute supportProtected="true". This allows content under protected nodes to be indexed. If you don't wish to index such content then leave this attribute out.

 

Step 5. Create your searchers - there will be THREE. One for Alpha, one for Media and one for BOTH Alpha and Media together which we can use in our filtered search page later. We will be able to search just Alpha, just Media or Alpha and Media together!

 

Add three new Examine search providers immediately after the PDFSearcher that comes with install in the ExamineSearchProviders section.

 

N.B. The naming convention MUST follow  YourIndexNameSearcher - in this case then AlphaSearcher and MediaSearcher. Note the combined MulitIndexSearcher explicitly specifies the indexes to use and so does not follow the usual naming convention as it has no specifically associated single index or indexer.

 

<!-- Searcher for Alpha-->
<add name="AlphaSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
<!--Searcher for Media-->
<add name="MediaSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" />
<!--Searcher for Multiple Indexes-->
<add name="MultiIndexSearcher" type="Examine.LuceneEngine.Providers.MultiIndexSearcher, Examine" indexSets="MediaIndexSet,AlphaIndexSet" enableLeadingWildcards="true"/>

 

Step 6. Let's see if your indexing efforts to date are yielding results!

 

In the CMS go to Developer and select the Examine Settings Tab.

 

You should see something similar to the following:

 

Examine Management
Indexers

 

InternalMemberIndexer
InternalIndexer
ExternalIndexer
PDFIndexer
AlphaIndexer
MediaIndexer


Searchers

 

InternalMemberSearcher
InternalSearcher
ExternalSearcher
PDFSearcher
AlphaSearcher
MediaSearcher
MultiIndexSearcher

 

In the indexers select your new indexes one by one and check there is indexed content under the Index Info & Tools link. If there isn't then rebuild the index.

 

Once you have confirmed index content exists then select your searchers one by one (don't forget to test your MultiIndexSearcher) and under Search Tools perform a Lucene search - it's this search we are using. On the Multi Indexer try a term that will return Media Files as well as Alpha content.

 

In the results that are returned you should be able to pick out the custom property names you defined for indexing alongside the standard properties like nodeId.

 

OK - we now have two working indexes, two working indexers and three working searchers!

 

Step 7. Build a simple search page.

 

I'm not going to go into the process of hooking up the search query interface to the results page aside from stating that we'll assume in this case the results page takes a query and filter value from the querystring. Implementation of the interface is up to you - however there some neat interfaces out there if you search for bootstrap search with filter. I liked this one http://bootsnipp.com/snippets/featured/search-panel-with-filters

 

Here's the basic code to perform the search.

 

Lucene is VERY powerful and the below is intended only to get you started, you can really go to town with Lucene - that's for another post!

 

@inherits Umbraco.Web.Mvc.UmbracoTemplatePage
@using System.Web.Mvc.Html
@using Umbraco.Web
@using UmbracoExamine
@using Examine.LuceneEngine.SearchCriteria;
@using Examine.LuceneEngine;
@using Examine.LuceneEngine.Providers;
@{
     Layout = "yourLayout.cshtml";
     //my search interface will not kick off a search 
     //without a query being present
     string term = Request.QueryString["Query"];
     string filter = Request.QueryString["Filter"];
}
<strong>Search for:</strong> @term<br />
<strong>Filter by:</strong> @filter<br />

<ul>
@{
   //container for the searcher
   Examine.Providers.BaseSearchProvider searcher = null;
   //container for the results
   ISearchResults searchResults;
   //now we need to work out which searcher to use
   //so we'll switch/case on the filter  parameter in the querystring
   switch (filter)
   {
       case "all":
       {
           searcher = ExamineManager.Instance.SearchProviderCollection["MultiIndexSearcher"];
           break;
       }
       case "alpha":
       {
           searcher = ExamineManager.Instance.SearchProviderCollection["AlphaSearcher"];
           break;
       }
       case "media":
       {
           searcher = ExamineManager.Instance.SearchProviderCollection["MediaSearcher"];
           break;
       }
   }

   //N.B. The following is REALLY basic but should get you started. It's this area where we could really go to town with Lucene.

   //create search criteria
   var searchCriteria = searcher.CreateSearchCriteria();

   //pass the criteria our search term as a raw query
   var query = searchCriteria.RawQuery(term);

   //action a search based on our query
   searchResults = searcher.Search(query);

   //iterate the results
   //we could be clever here using linq queries but I want 
   //to keep it simple, easily readable and understandable 
   //for this exercise 
   foreach (var searchResult in searchResults)
   {
    if (searchResult.Fields["nodeTypeAlias"].ToString().ToLower() == "file")
    {
        //we have found a physical file in the media folder structure
        //note that this is the line where we can capitalise on that Media File custom property we created - friendlyFileName
        var media = Umbraco.Media(searchResult.Id);
        <li>
          <a href="@media.Url" target="_blank">@searchResult.Fields["friendlyFileName"]</a>
        </li>
     }
     else
     {
        //we are a content type other than a file
        <li>
          <a href="@Umbraco.NiceUrl(searchResult.Id)">@searchResult.Fields["nodeName"]</a>
        </li>
      }
   }
}
</ul>

Here's hoping this article helps some budding Umbraco Examine Search 1st Timers out there :-)

 

If you need any help, advise or support, when it comes to Umbraco our developers are the best you'll find anywhere!

Get in touch

 

See how we can improve your online presence.

 

Start your next project