首页> 外文会议>International conference on management of data >Exact Indexing for Support Vector Machines
【24h】

Exact Indexing for Support Vector Machines

机译:用于支持向量机的精确索引

获取原文

摘要

SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1 ~5% of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that rinds exact top-k results of SVM functions without a full scan of data set.
机译:SVM(支持向量机)是一种既定的机器学习方法,普遍用于分类,回归和排名。最近SVM已积极研究等级学习,并应用于包括搜索引擎或相关反馈系统的各种应用程序。这种系统中的查询是SVM学习的排名函数F。一旦学习函数f或制定查询,处理查询以查找顶-K结果需要通过f来评估整个数据库。到目前为止,没有对SVM函数的精确索引解决方案。现有的Top-K查询处理算法不适用于机器学习的排名功能,因为它们通常会对查询进行限制性假设,例如线性或功能的单调性。现有的基于度量或基于参考的索引方法也不适用,因为数据点在必须构建索引的内核空间(SVM功能空间)中是不可见的。现有内核索引方法返回近似结果或修复内核参数。本文提出了具有不同内核参数的SVM功能的精确索引解决方案。我们首先提出了内核空间 - 排名不稳定性和排序稳定性的关键几何属性 - 这对于内核空间中的构建指数至关重要。基于它们,我们开发了索引结构Ikernel和处理算法。然后,我们在内核空间中呈现聚类技术以增强索引的修剪效果。根据我们的实验,Ikernel在大数据集中高效地生产1〜5%的评估比率。根据我们的最佳知识,Ikernel是第一个索引解决方案,它将SVM功能的精确顶部K结果进行了完全扫描数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号