To extract a small amount of representative keyphrases from large microblog set is a useful but challenging task. In this paper, we analyze 20 sets of microblogs and find that people often use various phrases to express the same information unit while many of these phrases show similarity relationships. Therefore, we propose a similarity features based context-sensitive topical PageRank method for keyphrase ranking after topic decomposition using author-topic model. We evaluate our proposed method on a large microblog dataset. Experiments show that our system is very effective for keyphrase extraction, especially for digging out those keyphrases which are submerged in various forms.
展开▼