...
首页> 外文期刊>Neurocomputing >Nearest-neighbor method using multiple neighborhood similarities for social media data mining
【24h】

Nearest-neighbor method using multiple neighborhood similarities for social media data mining

机译:使用多个邻域相似度的最近邻方法用于社交媒体数据挖掘

获取原文
获取原文并翻译 | 示例

摘要

Currently, Nearest-Neighbor approaches (NN) have been applied to large scale real world image data mining. However, the following three disadvantages prevent them from wider application compared to other machine learning methods: (i) the performance is inferior on small datasets; (ii) the performance will degrade for data with high dimensions; (iii) they are heavily dependent on the chosen feature and distance measure. In this paper, we try to overcome the three mentioned intrinsic weaknesses by taking the abundant and diversified content of social media images into account. Firstly, we propose a novel neighborhood similarity measure which encodes both the local density information and semantic information, thus it has better generalization power than the original image-to-image similarity. Secondly, to enhance the scalability, we adopt kernelized Locality Sensitive Hashing (KLSH) to conduct approximated nearest neighbor search by utilizing a set of kernels calculated on several complementary image features. Finally, to enhance the robustness on diversified genres of images, we propose to fuse the discrimination power of different features by combining multiple neighborhood similarities calculated on different features/kernels with the entire retrieved nearest labeled and unlabeled image via the hashing systems. Experimental results on visual categorization on the Caltech-256 and two social media databases show the advantage of our method over traditional NN methods using the labeled data only.
机译:当前,最近邻方法(NN)已被应用于大规模的现实世界图像数据挖掘。但是,与其他机器学习方法相比,以下三个缺点阻止了它们的广泛应用:(i)在小型数据集上性能较差; (ii)对于高维数据,性能将下降; (iii)它们在很大程度上取决于所选的特征和距离度量。在本文中,我们尝试通过考虑社交媒体图像的丰富多样的内容来克服上述三个固有的弱点。首先,我们提出了一种新颖的邻域相似性度量,该度量对本地密度信息和语义信息都进行了编码,因此它比原始的图像间相似性具有更好的泛化能力。其次,为了提高可扩展性,我们采用内核化的局部敏感哈希(KLSH),通过利用在几个互补图像特征上计算出的一组内核来进行近似最近邻搜索。最后,为了增强对各种图像类型的鲁棒性,我们建议通过将通过不同特征/内核计算出的多个邻域相似度与整个通过哈希系统检索到的最近标记和未标记图像相结合,融合不同特征的鉴别能力。在Caltech-256和两个社交媒体数据库上进行视觉分类的实验结果表明,与仅使用标记数据的传统NN方法相比,我们的方法具有优势。

著录项

  • 来源
    《Neurocomputing 》 |2012年第2012期| p.105-116| 共12页
  • 作者单位

    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China;

    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China,Graduate University, Chinese Academy of Sciences, Beijing 100049, China;

    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China;

    Department of Computer Science, University of Texas at San Antonio, TX 78249, USA;

    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    nearest neighbor method; multiple neighborhood similarity; visual categorization; locality sensitive hashing;

    机译:最近邻法多邻域相似度;视觉分类局部敏感哈希;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号