Monday, December 30, 2013

SharePoint 2013 Ceres Shell – down the search rabbit hole

When preparing for the BIWUG Session – Everything you always wanted to know about SharePoint Search Relevance  I came across a number of blog posts about the Ceres shell. For those of you interested about the inner workings of SharePoint search, keep on reading.

The Ceres shell is a set of PowerShell cmdlets which allow you to control the internal workings of SharePoint Server 2013 search. It is a remnant of the FAST search engine which provided a set of very powerful tools to tweak the search engine inner working (but also to completely tear down you search :-)) and which is now integrated in SharePoint Server 2013.

As Christoffer Vig outlined in his blog post Making synonyms visible in SharePoint 2013 search results  modifying the settings in the Ceres engine can completely ruin your SharePoint install, so be careful and I’m not really sure if modifications are even supported.
In this post we will explore how the different content processing pipeline components work using the Ceres shell. The Ceres shell script is located underneath C:\Program Files\Microsoft Office Servers\15.0\Search\Scripts\ceresshell.ps1.  The script listed below allows you to connect to Ceres engine and list all the different flows.
Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
& "C:\Program Files\microsoft office servers\15.0\search\Scripts\ceresshell.ps1"
Connect-System -Uri (Get-SPEnterpriseSearchServiceApplication).SystemManagerLocations[0] -ServiceIdentity hpw\spfarm

To look at the configuration of one specific flow you can specify the flow name as a  parameter.

Get-Flow Microsoft.MetadataExtractorSubFlow

One of the things I had been struggling with in SharePoint 2013 is the title field which is shown in the search results. By default SharePoint 2013 contains a mechanism which will override the title managed search property with some extracted text from PowerPoint and Word documents – this is similar to the EnableOptimisticTitleOverride functionality in SharePoint 2010. The Microsoft.MetadataExtractorSubFlow.dll contains some mechanism which determines that the some line of text in a header of a document is a more suitable title for the document and overwrites all of your other properties (The title filled in the Word document properties or the title in SharePoint, etc…). However, this does not always work as expected.

If you look at the flow details, you will see that extractedTitleField maps onto the Title managed search property and that the Author and LastModifiedTime fields use the same mechanism.

I guess that if you update this flow in a similar fashion as here Making synonyms visible in SharePoint 2013 search results you will be able to display the metadataextraction but I think there is a better way to do this after installing the SharePoint Server 2013 October 2013 Cumulative Update  - stay tuned for another update on this in a next  blog post.

If you have found great blog posts or other uses of the Ceres engine – please leave a comment.


No comments: