首页> 外文会议>International conference on management of data >Exact Indexing for Support Vector Machines
【24h】

Exact Indexing for Support Vector Machines

机译:支持向量机的精确索引

获取原文

摘要

SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1 ~5% of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that rinds exact top-k results of SVM functions without a full scan of data set.
机译:SVM(支持向量机)是一种完善的机器学习方法,广泛用于分类,回归和排名。最近,已经对SVM进行了积极的研究以进行等级学习,并将其应用于包括搜索引擎或相关性反馈系统在内的各种应用程序。在这样的系统中的查询是由SVM学习的排名函数F。一旦学习了函数F或制定了查询,对查询进行处理以找到前k个结果,就需要通过F评估整个数据库。到目前为止,还没有针对SVM函数的精确索引解决方案。现有的top-k查询处理算法不适用于机器学习的排名函数,因为它们通常会对查询进行限制性假设,例如函数的线性或单调性。现有的基于度量或基于引用的索引方法也不适用,因为数据点在必须建立索引的内核空间(SVM功能空间)中是不可见的。现有的内核索引方法返回近似结果或修复内核参数。本文针对具有不同内核参数的SVM函数提出了一种精确的索引解决方案。我们首先提出内核空间的关键几何特性-排序不稳定性和顺序稳定性-这对于在内核空间中建立索引至关重要。基于它们,我们开发了索引结构iKernel和处理算法。然后,我们在内核空间中提出聚类技术,以增强索引的修剪效果。根据我们的实验,在大数据集上,iKernel总体上可以产生1〜5%的评估比率,非常有效。据我们所知,iKernel是第一个索引解决方案,可以在不对数据集进行全面扫描的情况下获取SVM函数的确切前k个结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号