首页> 外文会议>IEEE International Conference on Data Mining >Mining Contentious Documents Using an Unsupervised Topic Model Based Approach
【24h】

Mining Contentious Documents Using an Unsupervised Topic Model Based Approach

机译:采用无监督主题模型的方法挖掘争议文件

获取原文

摘要

This work proposes an unsupervised method intended to enhance the quality of opinion mining in contentious text. It presents a Joint Topic Viewpoint (JTV) probabilistic model to analyse the underlying divergent arguing expressions that may be present in a collection of contentious documents. It extends the original Latent Dirichlet Allocation (LDA), which makes it domain and thesaurus-independent, e.g., does not rely on Word Net coverage. The conceived JTV has the potential of automatically carrying the tasks of extracting associated terms denoting an arguing expression, according to the hidden topics it discusses and the embedded viewpoint it voices. Furthermore, JTV's structure enables the unsupervised grouping of obtained arguing expressions according to their viewpoints, using a constrained clustering approach. Experiments are conducted on three types of contentious documents: polls, online debates and editorials. The qualitative and quantitative analysis of the experimental results show the effectiveness of our model to handle six different contentious issues when compared to a state-of-the-art method. Moreover, the ability to automatically generate distinctive and informative patterns of arguing expressions is demonstrated.
机译:这项工作提出了一种旨在提高有争议的文本意见挖掘质量的无人监督的方法。它呈现了一个联合主题视点(JTV)概率模型,用于分析可能存在于争议文档的集合中的底层不同的争论表达式。它扩展了原始潜在的Dirichlet分配(LDA),这使得IT域和独立于撰写的域名,例如,不依赖于Word Net覆盖范围。根据它讨论的隐藏主题和嵌入式视点IT声音,构思的JTV具有自动携带提取争论表达式的相关术语的任务的可能性。此外,JTV的结构使得能够根据其视点使用受约束的聚类方法,使得无监督分组获得获得的争论表达式。实验是在三种类型的争议文件中进行:民意调查,在线辩论和社论。与最先进的方法相比,实验结果的定性和定量分析显示了我们模型在处理六种不同争议问题的情况下。此外,证明了自动产生了争论表达式的独特和信息模式的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号