A client of mine wants to measure the difference between manual tagging and auto-classification
on unstructured documents, focusing in particular on its impact on
retrieval (i.e. relevance ranking). At the moment they are considering
two contrasting approaches:
- Create a list of all the insertions and deletions (i.e. instances
where the auto and manual tags differ for a given document), and sort by
frequency. Take those that appear more than given number of times (say
20), and count how often they appear as search terms in the top 1000
queries for the past 6 months. Include exact matches (where a tag and a
query term are identical), and partial matches (where a tag is wholly
included in a query), but exclude everything else. For tags that don’t
appear in the top 1000, assume a notional frequency of say 70. Then
divide the figure you get by the total number of queries over the past 6
months. This gives you a measure of how important those insertions and
deletions are, and thus the impact of manual tagging on retrieval.
- Run a controlled experiment in which the tagging condition is the
independent variable and the relevance ranking is the dependent
variable. Use a benchmark set of queries and relevance judgements, and
calculate precision and recall.
Surprisingly (to me, at least) there seems to be some debate as to which is the best approach.
Which one would you choose, and why?