首页> 外文会议>Annual meeting of the American Society for Information Science and Technology >Semi-supervised Probabilistic Sentiment Analysis:Merging Labeled Sentences with Unlabeled Reviews to Identify Sentiment
【24h】

Semi-supervised Probabilistic Sentiment Analysis:Merging Labeled Sentences with Unlabeled Reviews to Identify Sentiment

机译:半监督概率情感分析:将带有未标记评论的标记句子合并以识别情绪

获取原文

摘要

Document level sentiment analysis, the task of determiningwhether the sentiment expressed in a document is positiveor negative, is commonly performed by supervisedmethods. As with all supervised tasks, obtaining trainingdata for these methods can be expensive and timeconsuming.Some semi-supervised approaches have beenproposed that rely on sentiment lexicons. We propose anovel supervised and a novel semi-supervised sentimentanalysis method that are both based on a probabilisticgraphical model, without requiring any lexicon. Our semisupervisedmethod takes advantage of the numerical ratingsthat are often included in online reviews (e.g., 4 out of 5stars). While these numerical ratings are related tosentiment, they are noisy and hence, by themselves, theyare an imperfect indicator of reviews’ sentiments. Weincorporate unlabeled user reviews as training data bytreating the reviews’ numerical ratings as sentiment labelswhile modeling the ratings’ noisy nature. Our empiricalresults, utilizing a corpus of labeled sentences from hotelreviews and unlabeled hotel reviews with numerical ratings,show that treating reviews’ ratings as noisy and utilizingthem to augment a small amount of labeled sentencesoutperforms strong existing supervised and semi-supervisedclassification-based and lexicon-based approaches.
机译:文件级情感分析,任务确定 文件中表达的情绪是否是积极的 或否定,通常是在有监督的情况下执行 方法。与所有受监督的任务一样,获得培训 这些方法的数据可能既昂贵又费时。 一些半监督的方法已经 建议依靠情感词典。我们建议 新颖的监督和新颖的半监督情绪 均基于概率的分析方法 图形模型,不需要任何词典。我们的半监督 方法利用了数字等级 经常包含在在线评论中(例如,5分之4 星)。这些数字等级与 情绪,他们很吵,因此,他们自己 不能完美地反映评论的情绪。我们 将未加标签的用户评论作为培训数据纳入 将评论的数字评分作为情感标签 同时对评分的嘈杂性质进行建模。我们的经验 结果,利用酒店中带有标记的句子的语料库 条评论和带有数字评分的无标签酒店评论, 表明将评论的评分视为嘈杂并利用 它们会增加少量带标签的句子 胜过强大的现有监督和半监督 基于分类和基于词典的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号