首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Learning Conditional Latent Structures from Multiple Data Sources
【24h】

Learning Conditional Latent Structures from Multiple Data Sources

机译:从多个数据源学习条件潜在结构

获取原文

摘要

Data usually present in heterogeneous sources. When dealing with multiple data sources, existing models often treat them independently and thus can not explicitly model the correlation structures among data sources. To address this problem, we propose a full Bayesian nonparametric approach to model correlation structures among multiple and heterogeneous datasets. The proposed framework, first, induces mixture distribution over primary data source using hierarchical Dirichlet processes (HDP). Once conditioned on each atom (group) discovered in previous step, context data sources are mutually independent and each is generated from hierarchical Dirichlet processes. In each specific application, which covariates constitute content or context(s) is determined by the nature of data. We also derive the efficient inference and exploit the conditional independence structure to propose (conditional) parallel Gibbs sampling scheme. We demonstrate our model to address the problem of latent activities discovery in pervasive computing using mobile data. We show the advantage of utilizing multiple data sources in terms of exploratory analysis as well as quantitative clustering performance.
机译:数据通常存在于异构源中。当处理多个数据源时,现有的模型通常会独立地对待它们,因此无法显式地对数据源之间的相关结构进行建模。为了解决这个问题,我们提出了一种完整的贝叶斯非参数方法来对多个数据集和异构数据集之间的相关结构进行建模。首先,提出的框架使用分层Dirichlet流程(HDP)在主要数据源上引起混合分布。一旦以上一步中发现的每个原子(基团)为条件,上下文数据源便是相互独立的,并且每个数据源都是由分层Dirichlet过程生成的。在每个特定的应用程序中,哪些协变量构成内容或上下文取决于数据的性质。我们还推导了有效的推论,并利用条件独立性结构来提出(条件)并行吉布斯采样方案。我们演示了我们的模型,以解决使用移动数据的普适计算中潜在活动发现的问题。我们展示了在探索性分析和定量聚类性能方面利用多个数据源的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号