首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >KDX: an indexer for support vector machines
【24h】

KDX: an indexer for support vector machines

机译:KDX:支持向量机的索引器

获取原文
获取原文并翻译 | 示例
           

摘要

Support vector machines (SVMs) have been adopted by many data mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "top-k" best matches to the concept. However, when the data set is large, naively scanning the entire data set to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and, thus, improve the performance of top-k queries. Our kernel indexer (KDX) takes advantage of the underlying geometric properties and quickly converges on an approximate set of top-k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., /spl gamma/ and /spl sigma/) without performance compromise. Through theoretical analysis and empirical studies on a wide variety of data sets, we demonstrate KDX to be very effective. An earlier version of this paper appeared in the 2005 SIAM International Conference on Data Mining. This version differs from the previous submission in providing a detailed cost analysis under different scenarios, specifically designed to meet the varying needs of accuracy, speed, and space requirements, developing an approach for insertion and deletion of instances, presenting the specific computations as well as the geometric properties used in performing the same, and providing detailed algorithms for each of the operations necessary to create and use the index structure.
机译:支持向量机(SVM)已被许多数据挖掘和信息检索应用程序采用,以学习挖掘或查询概念,然后检索与该概念最匹配的“ top-k”。但是,当数据集很大时,天真地扫描整个数据集以找到最匹配的项是不可伸缩的。在这项工作中,我们提出了一种内核索引策略,以大幅减少搜索空间,从而提高top-k查询的性能。我们的内核索引器(KDX)充分利用了基础的几何属性,并迅速收敛于感兴趣的前k个实例的近似集合。更重要的是,一旦选择了内核(例如高斯内核)并构建了索引器,索引器便可以使用不同的内核参数设置(例如/ spl gamma /和/ spl sigma /)工作而不会影响性能。通过对各种数据集的理论分析和实证研究,我们证明了KDX是非常有效的。本文的早期版本出现在2005年SIAM国际数据挖掘会议上。该版本与以前的版本不同,它提供了在不同情况下的详细成本分析,专门用于满足准确性,速度和空间要求的变化需求,开发了一种插入和删除实例的方法,并提供了具体的计算方法以及用于执行相同操作的几何属性,并为创建和使用索引结构所需的每个操作提供详细的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号