Summarize Opinions With a Graph – Part 1
How does the saying go? Opinions are like bellybuttons, everybody’s got one? So let’s say you have an opinion that NOSQL is not for you. Maybe you read my blog and think this Graph Database stuff is great for recommendation engines and path finding and maybe some other stuff, but you got really hard problems and it can’t help you.
I am going to try to show you that a graph database can help you solve your really hard problems if you can frame your problem in terms of a graph. Did I say “you”? I meant anybody, especially Ph.D. students. One trick is to search for “graph-based approach to” and your problem.
I’ll give you an example. The other day I ran into “Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions,” by Kavita Ganesan, ChengXiang Zhai and Jiawei Han at the University of Illinois at Urbana-Champaign. Here is the abstract:
We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions. Evaluation results on summarizing user reviews show that Opinosis summaries have better agreement with human summaries compared to the baseline extractive method. The summaries are readable, reasonably well-formed and are informative enough to convey the major opinions.
What does that mean? It means Opinosis takes the free form text people write in reviews, aggregates it, and makes something useful out of it; so I can look at one sentence and not 1000 when looking for details about a review.
How is this useful? Most companies want to know what their customers are saying about them, but nobody has time to read 1000 responses to that customer survey. So generate a summary instead. Ebay feedback? Twitter posts about a specific hashtag? Text of support e-mails? You get the picture.
Let’s dive into what this means by an example that everyone is familiar with: e-commerce.
You can see the 1 to 5 star ratings and you already know how to build a recommendation algorithm out of this. We also know how to predict what the star rating of the user will be using personalization, but we want to ask a different question. Can we summarize what people are saying about this product? We want to do this because all our competitors are also giving items 1-5 star ratings, and they are also telling you what rating they think you’ll give this item. But it’s not enough. We turned to graph databases to get that little bit extra. That feature none of our competitors are offerin— that secret sauce, that edge.
We are going to take the things people are saying about the products we sell and generate a graph out of them, find the paths most traveled, and combine them to build our summary. An illustration might help:
Today we are just going to look at Step 1. Our input is going to be these two sentences:
My phone calls drop frequently with the iPhone.
Great device, but the calls drop too frequently.
With these, we can generate the following graph:
One interesting property about this graph is that it naturally captures redundancies. The paths shared by two sentences are captured by the nodes, and this sharing what allows us to have high confidence in the summaries we build.
Another property the graph has is that it can handle gaps between words, which helps us see the redundancy and allows us to discover new sentences.
A third interesting property about this graph is that it allows us to join similar sentences together:
Think about how these properties are going to help us build a summary that represents what our users are saying, and we’ll tackle building the graph in part 2.