首页> 外文会议>1st international CIKM workshop on topic-sentiment analysis for mass opinion measurement 2009 >Automatic creation of a reference corpus for political opinion mining in user-generated content
【24h】

Automatic creation of a reference corpus for political opinion mining in user-generated content

机译:自动创建参考语料库,以在用户生成的内容中进行政治观点挖掘

获取原文

摘要

We propose and evaluate a method for automatically creating a reference corpus for training text classification procedures for mining political opinions in user-generated content. The process starts by compiling a collection of highly opinionated comments posted by users on an on-line newspaper. Then, we define and use a set of manually-crafted high-precision rules supported by a large sentiment-lexicon in order to identify sentences in each comment expressing opinions about political entities. Finally, the opinions found are propagated to the remainder sentences of the comment mentioning the same entities, thus increasing the number and variety of opinion-bearing sentences. Results show that most of the rules can identify negative opinions with very high precision, and these can be safely propagated to the remainder sentences in the comment in almost 100% of the cases. Due to problems arising from irony, the precision of identification drops for positive opinions, but several rules still reachhigh precision. Propagation of positive opinions is correct in about 77% of the cases, and most errors at this stage result from irony and polarity inversion throughout the comment.
机译:我们提出并评估一种方法,该方法可自动创建参考语料库,以训练文本分类程序来挖掘用户生成的内容中的政治观点。该过程开始于收集用户在在线报纸上发布的高度评价的评论的集合。然后,我们定义并使用一组由大型情感词典支持的手工制作的高精度规则,以便识别每个评论中表达有关政治实体观点的句子。最后,发现的意见会传播到提及同一实体的评论的其余句子中,从而增加了带有意见的句子的数量和种类。结果表明,大多数规则可以非常高精度地识别负面意见,并且可以在几乎100%的情况下将这些意见安全地传播到评论中的其余句子。由于具有讽刺意味的问题,对于正确的观点,识别的精度下降,但是仍有一些规则达到了很高的精度。在大约77%的案例中,积极观点的传播是正确的,并且在此阶段,大多数错误都是由于整个评论中具有讽刺意味和极性倒置造成的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号