首页> 外文学位 >Probabilistic Models for Fine-Grained Opinion Mining: Algorithms and Applications.
【24h】

Probabilistic Models for Fine-Grained Opinion Mining: Algorithms and Applications.

机译:细粒度意见挖掘的概率模型:算法和应用。

获取原文
获取原文并翻译 | 示例

摘要

Public sentiments in online debates, discussions, comments are crucial to governmental agencies for passing new bills/policy, gauging upheaval, predicting elections, etc. However, to leverage the sentiments expressed in social opinions, we face two major challenges: (1) fine-grained opinion mining, and (2) filtering opinion spam to ensure credible opinion mining. We start with mining opinions from social conversations. We focus on fine-grained sentiment dimensions like agreement (I'd agree), disagreement (I refute). This is a major departure from the traditional polar (positive/negative) sentiments (e.g., good, nice vs. poor, bad) in standard opinion mining. In the domain of debates, joint topic and sentiment models are proposed to discover disagreement and agreement expressions, and contention points/topics both at the discussion level and also at the individual post level. Proposed models also encode interactions among discussants through quoting and replying relations.. Next, we address the problem of semantic incoherence in aspect extraction by knowledge induction using seeds. Seeds are certain user defined coarse groupings which guide the modeling process. Specifically, we build over topic models to propose novel aspect specific sentiment models guided by aspect seeds. The later part of this thesis proposes solutions for detecting opinion spam. Opinion spam refers to "illegitimate" human activities (e.g., writing fake reviews) that try to mislead readers by giving undeserving opinions/ratings to some entities (e.g., hotels, products) to promote/demote them. We address two problems in opinion spam. First is the problem of group spam, i.e., a group of spammers working in collusion. A novel relational ranking algorithm called GSRank is proposed for ranking spam groups based on mutual-reinforcement. The second problem is opinion spam detection in the absence of labeled data. The situation is important as it is hard and erroneous to manually label fake reviews or reviewers. Our solution is based on the hypothesis that spammers differ markedly from others on behavioral dimensions which creates a distributional divergence between two (latent) population clusters: spammers and non-spammers. Modeling spamicity of users as "latent" with observed behavioral footprints, novel generative models are proposed for detecting opinion spam/fraud.
机译:在线辩论,讨论,评论中的公众情绪对于政府机构通过新法案/政策,衡量动荡,预测选举等至关重要。然而,要利用社会舆论中表达的情绪,我们面临两个主要挑战:(1)罚款粒度的观点挖掘,以及(2)过滤观点垃圾邮件以确保可信的观点挖掘。我们首先从社交对话中挖掘观点。我们专注于细微的情感维度,例如同意(我同意),不同意(我反对)。在标准意见挖掘中,这与传统的极性(正面/负面)情绪(例如,好,好,坏,坏)大不相同。在辩论领域,提出了联合主题和情感模型,以在讨论级别和个人职位级别发现分歧和协议表达以及争用点/主题。提出的模型还通过引用和回复关系对讨论者之间的交互进行编码。接下来,我们解决了使用种子进行知识归纳在方面提取中语义不连贯的问题。种子是某些用户定义的粗略分组,它们指导建模过程。具体来说,我们建立在主题模型之上,以提出由方面种子引导的新颖方面特定的情感模型。本文的后半部分提出了检测垃圾邮件的解决方案。垃圾评论是指“不正当的”人类活动(例如,撰写虚假评论),试图通过对某些实体(例如,酒店,产品)提供不当的观点/评价来误导读者,以促进/降级他们。我们解决意见垃圾邮件中的两个问题。首先是群体垃圾邮件的问题,即一群垃圾邮件制造者相互勾结。提出了一种新颖的关系排序算法,称为GSRank,用于基于互为增强对垃圾邮件组进行排序。第二个问题是在没有标记数据的情况下检测垃圾邮件。这种情况很重要,因为手动标记假评论或审阅者既困难又错误。我们的解决方案基于以下假设:垃圾邮件发送者在行为维度上与其他垃圾邮件明显不同,这在两个(潜在)人口群体:垃圾邮件发送者和非垃圾邮件发送者之间造成了分布差异。将用户的垃圾邮件建模为具有观察到的行为足迹的“潜在”用户,提出了用于检测意见垃圾邮件/欺诈的新型生成模型。

著录项

  • 作者

    Mukherjee, Arjun.;

  • 作者单位

    University of Illinois at Chicago.;

  • 授予单位 University of Illinois at Chicago.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 146 p.
  • 总页数 146
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 遥感技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号