Over a million developers have joined DZone.

How do You Measure the Impact of Tagging on Search Retrieval?

· Java Zone

Learn more about the advantages of moving from a monolithic to microservices architecture.  Brought to you in partnership with IBM.

A client of mine wants to measure the difference between manual tagging and auto-classification on unstructured documents, focusing in particular on its impact on retrieval (i.e. relevance ranking).  At the moment they are considering two contrasting approaches:

  1. Create a list of all the insertions and deletions (i.e. instances where the auto and manual tags differ for a given document), and sort by frequency. Take those that appear more than given number of times (say 20), and count how often they appear as search terms in the top 1000 queries for the past 6 months. Include exact matches (where a tag and a query term are identical), and partial matches (where a tag is wholly included in a query), but exclude everything else. For tags that don’t appear in the top 1000, assume a notional frequency of say 70. Then divide the figure you get by the total number of queries over the past 6 months. This gives you a measure of how important those insertions and deletions are, and thus the impact of manual tagging on retrieval.

  2. Run a controlled experiment in which the tagging condition is the independent variable and the relevance ranking is the dependent variable. Use a benchmark set of queries and relevance judgements, and calculate precision and recall.

Surprisingly (to me, at least) there seems to be some debate as to which is the best approach.

Which one would you choose, and why?

From Idea to Application gives you the architecture to quickly build, manage and run a range of applications (web, mobile, big data, new smart devices, etc.) on an open-standard, cloud-based platform. See why developers are using IBM Bluemix. Brought to you in partnership with IBM.

Topics:

Published at DZone with permission of Tony Russell-rose, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}