首页> 外文期刊>Applied Soft Computing >Information entropy based sample reduction for support vector data description
【24h】

Information entropy based sample reduction for support vector data description

机译:基于信息熵的样本减少支持矢量数据描述

获取原文
获取原文并翻译 | 示例
           

摘要

Support vector data description (SVDD) is one of the most attractive methods in one-class classification (OCC), especially in solving problems in novelty detection. SVDD helps to deal with the classification witha large amount of target data and few outlier data. However, the huge computational complexity in kernel mapping makes it hard to be applied in use, as the number of target data increases. In order to reduce the size of the training data samples, we introduce a method called information entropy based sample reduction for support vector data description (IESRSVDD). In this method, the information entropy is calculated for the distribution of each data sample. The distance between each two samples is utilized to evaluate the probability of uncertainty for each sample. The samples with higher entropy values are considered to be near the boundary of the data distribution in kernel space, and likely to become support vectors. All samples with their entropy values lower than a threshold are excluded. An updated objective function of conventional SVDD is used in this method for sample reduction. The innovative highlights of the proposed IESRSVDD are: (i) reducing the training samples based on information entropy, (ii) introducing the sample reduction to SVDD in order to speed up the training process, and (iii) having the feasibility and effectiveness of IESRSVDD validated and analyzed. The experiment results show the proposed method can achieve a faster training speed by reducing the scale of the training set. The computing time is significantly reduced by 50-75% and the accuracy in classification is improved. (C) 2018 Elsevier B.V. All rights reserved.
机译:支持向量数据描述(SVDD)是单级分类(OCC)中最具吸引力的方法之一,尤其在解决新颖性检测中的问题。 SVDD有助于处理大量目标数据和几个异常数据的分类。然而,内核映射中的巨大计算复杂性使得在使用中难以应用,因为目标数据的数量增加。为了减小训练数据样本的大小,我们介绍一种称为信息熵的样本减少的方法,用于支持向量数据描述(IESRSVDD)。在该方法中,计算信息熵以用于分布每个数据样本。每个两个样本之间的距离用于评估每个样品的不确定性的概率。具有更高熵值的样本被认为是靠近内核空间中数据分布的边界,并且可能成为支持向量。所有具有低于阈值的熵值的样本都被排除在外。在该方法中使用常规SVDD的更新的目标函数进行样品。拟议的IESRSVDD的创新亮点是:(i)基于信息熵减少培训样本,(ii)将样品减少到SVDD,以加速培训过程,(iii)具有IESRSVDD的可行性和有效性验证和分析。实验结果表明,所提出的方法可以通过减少训练集的规模来实现更快的训练速度。计算时间明显减少50-75%,分类的准确性得到改善。 (c)2018 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号