Authors: Lu Wang, Wang Ling
Generate opinion summary using an encoder-decoder neural model. The problem can be formulated as given an input, which contains a set of reviews/arguments, construct a single sentence to summarize them.
Novelty (to me)
Data collection: a movie review dataset (Rotten Tomatoes) and a debate dataset (Idebate).
Importance-based sampling: The input contains not one but many sentences. Simply concatenating them into one long input sequence slows down the model. Instead, only a subset of sentences are sampled from an “importance” distribution.
Human evaluation of summaries: a portion of output is manually judged by English-fluent humans on three criteria (informativeness, grammaticality, compactness).
Importance-based sampling (section 3.5): we want to weight each sentence according to how useful it is. Cast as a regression problem. The score of each sentence is the number of overlapping content with the gold summary. An extra term is added to enforce a margin between the scores of unrelated and related sentences. I am not sure how this term is constructed, though. The formula is taking differences between sentences with labels > 0 and sentences with labels = 0. But if a label = 0, then the corresponding sentence is essentially its gold summary. So it must be a very rare case?
Results (section 5): One interesting thing is the model produces consistently short summaries across datasets. In table 3, while LexRank and Submodular increase their summary lengths substantially (from 16 to over 20) when going from RottenTomatoes to Idebate, the neural model shortens its summaries. There is a tradeoff here, I guess, because its METEOR and ROUGE, which are recall-oriented, scores are lower for Idebate, i.e. the model leaves out details of the summary.
The human evaluation section is very exciting. I was like “Hmm, I don’t really trust these metrics. I need real people to look at them.” and there it is to convince me. While the neural model does very well on this test, LexRank is not bad at all.
This paper demonstrates the many cool abilities of the encoder-decoder model. It understands content, generates grammatically correct sentences, and knows how to make them concise. I am particularly fond of the last ability. If I am not wrong, there is nothing in the model specification that constraints it to make the summary short. But still, it learn a sense of conciseness from just seeing examples.
I wonder why they didn’t conduct human evaluation on Idebate.
And I hope they will give a cute name to their model so I won’t have to call it “the neural model”.