首页> 外文会议>IEEE International Conference on Innovations in Intelligent Systems and Applications >Non-parametric Discovery of Topics and Communities in Distributed and Streaming Environments
【24h】

Non-parametric Discovery of Topics and Communities in Distributed and Streaming Environments

机译:分布式和流环境中主题和社区的非参数发现

获取原文
获取外文期刊封面目录资料

摘要

Several recent works have focused on improving latent-space based modeling of streaming count-based data such as streaming textual feeds and evolving social networks. However, many of these models do not inherently scale to large data sets, nor do they accommodate drift in the inferred latent factors (e.g. topics, social groups) over time. In addition, the functional form of distributed and streaming processing architectures recently introduced in industry places constraints on how dynamic algorithms can be expressed, for example, that they must be inherently state-ful. We propose a comprehensive and flexible approach to distributed and dynamic inference of Bayesian count factorization models, focusing on a recently introduced nonparametric, joint topic-community factorization model called Joint Gamma Process Poisson Factorization (JGPPF). The method is illustrated in an Apache Spark implementation using twelve years of U.S. Senate voting records.
机译:最近的一些工作集中在改进基于潜在空间的基于流计数的数据的建模,例如基于流计数的文本提要和不断发展的社交网络。但是,这些模型中的许多模型并不能固有地扩展到大数据集,也无法适应随着时间推移而推断出的潜在因素(例如主题,社会群体)的漂移。另外,最近在行业中引入的分布式和流处理架构的功能形式对如何表达动态算法提出了限制,例如,动态算法必须固有地是有状态的。我们针对贝叶斯计数因子分解模型的分布式和动态推断,提出了一种全面,灵活的方法,重点是最近引入的称为联合伽玛过程泊松因子分解(JGPPF)的非参数,联合主题-社区因子分解模型。该方法在使用十二年美国参议院投票记录的Apache Spark实现中进行了说明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号