...
首页> 外文期刊>SIGKDD explorations >Mining Information from Heterogeneous Sources: A Topic Modeling Approach
【24h】

Mining Information from Heterogeneous Sources: A Topic Modeling Approach

机译:从异构源中挖掘信息:主题建模方法

获取原文
获取原文并翻译 | 示例

摘要

In recent years, the phenomenal growth and popularity of social media, news and discussion websites has led to a vast number of information sources available online. These sources generate massive amounts of real-time content on a daily basis making it increasingly difficult to glean true and useful information from them. Automatically categorizing and compressing important contextual information from these sources is crucial for tasks such as web document classification and summarization. Therefore, in this paper, we propose a novel topic modeling framework- Probabilistic Source LDA which is designed to handle heterogeneous sources. Probabilistic Source LDA can compute latent topics for each source, maintain topic-topic correspondence between sources and yet retain the distinct identity of each individual source. Therefore, it helps to mine and organize correlated information from many different sources. At the same time, it aids in automatically reducing noise and redundancy in the information gathered. Using real data on the US elections 2012, we demonstrate that our Probabilistic Source LDA method can extract highly relevant latent topics while maintaining topic-topic congruence between different sources.
机译:近年来,社交媒体,新闻和讨论网站的迅猛增长和普及导致在线提供了大量信息资源。这些来源每天都会产生大量的实时内容,因此越来越难以从中收集真实有用的信息。从这些来源自动分类和压缩重要的上下文信息对于诸如Web文档分类和摘要之类的任务至关重要。因此,在本文中,我们提出了一个新颖的主题建模框架-概率源LDA,该框架旨在处理异构源。概率源LDA可以为每个源计算潜在主题,维护源之间的主题-主题对应关系,但仍保留每个单独源的独特身份。因此,它有助于挖掘和组织来自许多不同来源的相关信息。同时,它有助于自动减少所收集信息的噪声和冗余。使用2012年美国大选的真实数据,我们证明了概率来源LDA方法可以提取高度相关的潜在主题,同时保持不同来源之间的主题-主题一致性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号