首页> 外文会议>IEEE International Conference on Multimedia and Expo >Exploiting side information in distance dependent Chinese restaurant processes for data clustering
【24h】

Exploiting side information in distance dependent Chinese restaurant processes for data clustering

机译:在距离相关的中餐厅过程中利用辅助信息进行数据聚类

获取原文

摘要

Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian nonparametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance.
机译:多媒体内容通常具有弱注释的数据,例如标签,链接和交互。弱注释的数据称为辅助信息。它是数据的辅助信息,为探索数据的链接结构提供了提示。大多数聚类算法利用纯数据进行聚类。结合了纯数据和辅助信息(例如图像和标签,文档和关键字)的模型可以更好地理解数据的基础结构。我们演示了如何将不同类型的附带信息整合到最近提出的贝叶斯非参数模型中,即距离相关的中国餐馆过程(DD-CRP)。当边信息以离散标签的子集形式出现时,我们的算法会将这些信息的亲和力嵌入DD-CRP的衰减函数中。可以根据任意辅助信息而不是仅根据观测的空间布局或时间戳来测量距离,这一点很灵活。同时,对于嘈杂和不完整的辅助信息,我们设置了衰减函数,以使DD-CRP减少到传统的中国餐馆过程,因此不会引起嘈杂和不完整的辅助信息的副作用。对两个真实世界的数据集NUS WIDE和20个新闻组的实验评估表明,利用DD-CRP中的辅助信息可显着提高聚类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号