【24h】

Identifying Rare Classes with Sparse Training Data

机译:用稀疏训练数据识别罕见的课程

获取原文

摘要

Building models and learning patterns from a collection of data are essential tasks for decision making and dissemination of knowledge. One of the common tools to extract knowledge is to build a classifier. However, when the training dataset is sparse, it is difficult to build an accurate classifier. This is especially true in biological science, as biological data are hard to produce and error-prone. Through empirical results, this paper shows challenges in building an accurate classifier with a sparse biological training dataset. Our findings indicate the inadequacies in well known classification techniques. Although certain clustering techniques, such as seeded k-Means, show some promise, there are still spaces for further improvement. In addition, we propose a novel idea that could be used to produce more balanced classifier when training data samples are very limited.
机译:从集合的建立模型和学习模式是决策和传播知识的必要任务。提取知识的一个常见工具是构建分类器。但是,当训练数据集稀疏时,很难构建准确的分类器。这种生物科学尤其如此,因为生物数据很难产生和容易出错。通过经验结果,本文显示了建立具有稀疏生物训练数据集的准确分类器的挑战。我们的研究结果表明了知名分类技术的不足。虽然某些聚类技术,例如种子K-means,但显示了一些承诺,但仍有进一步改进的空间。此外,我们提出了一种新的想法,可用于在训练数据样本非常有限时生产更多平衡分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号