...
首页> 外文期刊>International journal of machine learning and cybernetics >Top K representative: a method to select representative samples based on K nearest neighbors
【24h】

Top K representative: a method to select representative samples based on K nearest neighbors

机译:TOP K代表:一种基于K最近邻居选择代表性样本的方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Short text categorization involves the use of a supervised learning process that requires a large amount of labeled data for training and therefore consumes considerable human labor. Active learning is a way to reduce the number of manually labeled samples in traditional supervised learning problems. In active learning, the number of samples is reduced by selecting the most representative samples to represent an entire training set. Uncertainty sampling is a means of active learning but is easily affected by outliers. In this paper, a new sampling method called Top K representative (TKR) is proposed to solve the problem caused by outliers. However, TKR optimization is a nondeterministic polynomial-time hardness (NP-hard) problem, making it challenging to obtain exact solutions. To tackle this problem, we propose a new approach based on the greedy algorithm, which can obtain approximate solutions, and thereby achieve high performance. Experiments show that our proposed sampling method outperforms the existing methods in terms of efficiency.
机译:短文本分类涉及使用监督的学习过程,该过程需要大量标记的培训数据,因此消耗相当大的人工劳动力。主动学习是一种方法来减少传统监督学习问题中手动标记样本的数量。在主动学习中,通过选择最具代表性的样本来减少样本的数量来表示整个训练集。不确定性抽样是一种积极学习的手段,但很容易受到异常值的影响。在本文中,提出了一种称为Top K代表(TKR)的新采样方法来解决由异常值引起的问题。然而,TKR优化是一种非近期的多项式硬度(NP-Hard)问题,使得获得精确的解决方案挑战。为了解决这个问题,我们提出了一种基于贪婪算法的新方法,可以获得近似解决方案,从而实现高性能。实验表明,我们所提出的采样方法在效率方面优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号