Mining Contentious Documents Using an Unsupervised Topic Model Based Approach

机译：使用基于无监督主题模型的方法挖掘有争议的文档

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work proposes an unsupervised method intended to enhance the quality of opinion mining in contentious text. It presents a Joint Topic Viewpoint (JTV) probabilistic model to analyse the underlying divergent arguing expressions that may be present in a collection of contentious documents. It extends the original Latent Dirichlet Allocation (LDA), which makes it domain and thesaurus-independent, e.g., does not rely on Word Net coverage. The conceived JTV has the potential of automatically carrying the tasks of extracting associated terms denoting an arguing expression, according to the hidden topics it discusses and the embedded viewpoint it voices. Furthermore, JTV's structure enables the unsupervised grouping of obtained arguing expressions according to their viewpoints, using a constrained clustering approach. Experiments are conducted on three types of contentious documents: polls, online debates and editorials. The qualitative and quantitative analysis of the experimental results show the effectiveness of our model to handle six different contentious issues when compared to a state-of-the-art method. Moreover, the ability to automatically generate distinctive and informative patterns of arguing expressions is demonstrated.

机译：这项工作提出了一种无监督的方法，旨在提高有争议文本中观点挖掘的质量。它提供了一个联合主题观点（JTV）概率模型来分析可能存在于有争议文档集合中的潜在分歧辩论表达式。它扩展了原始的潜在Dirichlet分配（LDA），这使其成为领域和词库无关的内容，例如，不依赖Word Net覆盖范围。构想的JTV可以根据其讨论的隐藏主题和表达的嵌入观点，自动执行提取表示争论性表达的相关术语的任务。此外，JTV的结构允许使用约束聚类方法根据其观点对所获得的论证表达式进行无监督的分组。实验针对三种有争议的文件进行：民意调查，在线辩论和社论。实验结果的定性和定量分析表明，与最新方法相比，我们的模型可有效处理六个不同的争议问题。此外，还展示了自动生成争论性表达的独特且信息丰富的模式的能力。

著录项

来源
《IEEE International Conference on Data Mining》|2014年|550-559|共10页
会议地点
作者
Trabelsi Amine; Zaiane Osmar R.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
data mining; document handling; probability; JTV probabilistic model; LDA; Word Net coverage; arguing expression; contentious document mining; editorials; joint topic viewpoint probabilistic model; latent Dirichlet allocation; online debates; opinion mining; polls; qualitative analysis; quantitative analysis; unsupervised grouping; unsupervised method; unsupervised topic model based approach; Data mining; Data models; Editorials; Government; Insurance; Joints; Medical services;

机译：数据挖掘;文档处理;概率; JTV概率模型; LDA;词网覆盖;争论的表达;有争议的文档挖掘;编辑;联合主题观点概率模型;潜在Dirichlet分配;在线辩论;观点挖掘;民意测验;定性分析;定量分析;无监督分组;无监督方法;基于无监督主题模型的方法;数据挖掘;数据模型;编辑;政府;保险;联合;医疗服务;

相似文献

外文文献
中文文献
专利

1. Unsupervised mining of long time series based on latent topic model [J] . Jin Wang, Xiangping Sun, Mary F.H. She, Neurocomputing . 2013,第MARa1期

机译：基于潜在主题模型的长时间序列的无监督挖掘
2. Topic modeling for sequential documents based on hybrid inter-document topic dependency [J] . Li Wenbo, Saigo Hiroto, Tong Bin, Journal of Intelligent Information Systems . 2021,第3期

机译：基于混合文档主题依赖性的顺序文档主题建模
3. Social media mining for product planning: A product opportunity mining approach based on topic modeling and sentiment analysis [J] . Jeong Byeongki, Yoon Janghyeok, Lee Jae-Min International Journal of Information Management . 2019,第Octa期

机译：用于产品计划的社交媒体挖掘：基于主题建模和情感分析的产品机会挖掘方法
4. Mining Contentious Documents Using an Unsupervised Topic Model Based Approach [C] . Trabelsi Amine, Zaiane Osmar R. IEEE International Conference on Data Mining . 2014

机译：采用无监督主题模型的方法挖掘争议文件
5. Text association mining with cross-sentence inference, structure-based document model and multi-relational text mining. [D] . Thaicharoen, Supphachai. 2009

机译：带有跨句推理的文本关联挖掘，基于结构的文档模型和多关系文本挖掘。
6. Synonym Topic Model and Predicate-Based Query Expansion for Retrieving Clinical Documents [O] . Qing T. Zeng, Doug Redd, Thomas Rindflesch, 2012

机译：用于检索临床文档的同义词主题模型和基于谓词的查询扩展
7. A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques [O] . Shotorbani, Peyman,, Ameri, Farhad, Kulvatunyou, Boonserm, 2016

机译：基于文档聚类和主题建模技术的制造文本挖掘混合方法

Mining Contentious Documents Using an Unsupervised Topic Model Based Approach

摘要

著录项

相似文献

相关主题

期刊订阅