【24h】

Novelty generating machine

机译:新颖的发电机

获取原文

摘要

Novelty detection is one of primary tasks in data mining and machine learning. The task is to differentiate unseen outliers from normal patterns. Though novelty detection has been well-studied for many years and has found a wide range of applications, identifying outliers is still very challenging because of the absence or scarcity of outliers. We observe several characteristics of outliers and normal patterns. First, normal patterns are usually grouped together and form some clusters in high density regions of the data. Second, outliers are very different from the normal patterns, and in turn these outliers are far away from the normal patterns. Third, the number of outliers is very small compared with the size of the dataset. Based on these observations, we can envisage that the decision boundary between outliers and normal patterns usually lies in some low density regions of the data, which is referred to as cluster assumption. The resultant optimization problem is in form of a mixed integer programming. Then, we present a cutting plane algorithm together with multiple kernel learning techniques to solve its convex relaxation. Moreover, we make use of the scarcity of outliers to find a violating solution in cutting plane algorithm.
机译:新奇检测是数据挖掘和机器学习中的主要任务之一。任务是将视路异常值与正常模式区分开来。虽然很多年度的新奇检测已经很好地研究,但已经发现了广泛的应用,识别异常值仍然非常具有挑战性,因为异常值缺席或稀缺。我们遵守异常值和正常模式的几个特征。首先,通常将正常模式分组在一起并在数据的高密度区域中形成一些簇。其次,异常值与正常模式截然不同,反过来,这些异常值远离正常模式。第三,与数据集的大小相比,异常值的数量非常小。基于这些观察,我们可以设想异常值和正常模式之间的决策边界通常位于数据的一些低密度区域,这被称为集群假设。结果优化问题是混合整数编程的形式。然后,我们介绍了一种切割平面算法以及多个内核学习技术来解决其凸弛豫。此外,我们利用异常值的稀缺来找到切割平面算法中的违规解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号