Saturday, September 15, 2012

About intent, recall, relevance and precision of search solutions

One of the challenges of designing a good search solution is understanding the range of different search requirements from users. When you look at the Google search screen (or a SharePoint search center screen for that matter) it is quite minimalistic. The trick is to look for the intention of the user who is performing a search operation.

So what is the user looking for?

Key in delivering a good search result is the fact whether the search engine has understood the unstated question. Traditional metrics which are commonly used to measure the efficiency and effectiveness of search are the following:

  • Relevance = whether a document is relevant depends on your intent in the search. Basically, relevant results will help you achieve the goals that made you perform the search in the first place
  • Recall = the amount of total relevant documents which are retrieved by your search. A good recall means that not many documents are missing. If you know that there are 1000 relevant documents and the search would retrieve 100, the recall would be 10%. This metric is quite hard to measure in real life and is you should use this during a search validation phase where you are using a controlled, experimental environments.
  • Precision: the percentage of relevant documents in relation to the number of documents retrieved. If your search retrieves 100 documents and 20 of these are relevant, your precision is 20%. The opposite measure is called fallout, if you retrieve 100 documents and 20 are relevant, your fallout is 80%. Fallout becomes a bigger problem as the size of your search corpus become larger and your retrieved result set also gets larger. Scanning 80 irrelevant documents to fin 20 relevant ones may not be so bad, but with a 1000 results returned – this can be quite painful. Precision and recall typically go in opposite directions, when the query is broad, the recall is high, but precision is low and vice versa ….

The measure with which it all start is relevance – and relevancy is dependent on the intent of the user. In any organization there will be a range of different search requirements from users, depending on the purpose for which they are searching. If you are serious about building a good search experience you should identify common tasks that user are trying to achieve when performing a search. These requirements are best identified though the development of search personas and scenarios. For more information on identifying personas and search information design take a look at the links listed below:

No comments: