首页> 外文期刊>Neurocomputing >Dimensionality reduction for documents with nearest neighbor queries
【24h】

Dimensionality reduction for documents with nearest neighbor queries

机译:具有最近邻居查询的文档的降维

获取原文
获取原文并翻译 | 示例
           

摘要

Document collections are often stored as sets of sparse, high-dimensional feature vectors. Performing dimensionality reduction (DR) on such high-dimensional datasets for the purposes of visualization presents algorithmic and qualitative challenges for existing DR techniques. We propose the Q-SNE algorithm for dimensionality reduction of document data, combining the scalable probability-based layout approach of BH-SNE with an improved component to calculate approximate nearest neighbors, using the query-based APQ approach that exploits an impact-ordered inverted file. We provide thorough experimental evidence that Q-SNE yields substantial quality improvements for layouts of large document collections with commensurate speed. Our experiments were conducted with six real-world benchmark datasets that range up to millions of documents and terms, and compare against three alternatives for nearest neighbor search and five alternatives for dimensionality reduction. (C) 2014 Elsevier B.V. All rights reserved.
机译:文档集合通常存储为稀疏的高维特征向量集。为了可视化的目的在这样的高维数据集上执行降维(DR)提出了现有DR技术的算法和质量挑战。我们提出了一种Q-SNE算法,用于减少文档数据的维数,结合了基于BH-SNE的可伸缩基于概率的布局方法和改进的组件,以利用基于查询的APQ方法(利用影响顺序是反向的)文件。我们提供了详尽的实验证据,表明Q-SNE可以以相当的速度为大型文档集合的布局带来实质性的质量改进。我们的实验是使用六个真实世界的基准数据集进行的,这些数据集的范围多达数百万个文档和术语,并与三种用于最近邻搜索的备选方案和用于降维的五种备选方案进行比较。 (C)2014 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号