首页> 外文期刊>International journal of machine learning and cybernetics >Top K representative: a method to select representative samples based on K nearest neighbors
【24h】

Top K representative: a method to select representative samples based on K nearest neighbors

机译:前K个代表:一种基于K个最近邻居选择代表性样本的方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Short text categorization involves the use of a supervised learning process that requires a large amount of labeled data for training and therefore consumes considerable human labor. Active learning is a way to reduce the number of manually labeled samples in traditional supervised learning problems. In active learning, the number of samples is reduced by selecting the most representative samples to represent an entire training set. Uncertainty sampling is a means of active learning but is easily affected by outliers. In this paper, a new sampling method called Top K representative (TKR) is proposed to solve the problem caused by outliers. However, TKR optimization is a nondeterministic polynomial-time hardness (NP-hard) problem, making it challenging to obtain exact solutions. To tackle this problem, we propose a new approach based on the greedy algorithm, which can obtain approximate solutions, and thereby achieve high performance. Experiments show that our proposed sampling method outperforms the existing methods in terms of efficiency.
机译:短文本分类涉及使用受监督的学习过程,该过程需要大量标记数据进行培训,因此会消耗大量的人工。主动学习是减少传统监督学习问题中手动标记样本数量的一种方法。在主动学习中,通过选择最具代表性的样本来代表整个训练集来减少样本数量。不确定性采样是一种主动学习的方法,但容易受到异常值的影响。本文提出了一种新的抽样方法,称为Top K代表(TKR),以解决由异常值引起的问题。但是,TKR优化是一个不确定的多项式时间硬度(NP-hard)问题,很难获得精确的解。为了解决这个问题,我们提出了一种基于贪心算法的新方法,该方法可以获取近似解,从而实现高性能。实验表明,我们提出的采样方法在效率方面优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号