【24h】

Identifying Rare Classes with Sparse Training Data

机译:用稀疏的培训数据识别稀有课程

获取原文
获取原文并翻译 | 示例

摘要

Building models and learning patterns from a collection of data are essential tasks for decision making and dissemination of knowledge. One of the common tools to extract knowledge is to build a classifier. However, when the training dataset is sparse, it is difficult to build an accurate classifier. This is especially true in biological science, as biological data are hard to produce and error-prone. Through empirical results, this paper shows challenges in building an accurate classifier with a sparse biological training dataset. Our findings indicate the inadequacies in well known classification techniques. Although certain clustering techniques, such as seeded k-Means, show some promise, there are still spaces for further improvement. In addition, we propose a novel idea that could be used to produce more balanced classifier when training data samples are very limited.
机译:从数据收集中建立模型和学习模式是决策和传播知识的基本任务。提取知识的常用工具之一是构建分类器。但是,当训练数据集稀疏时,很难建立准确的分类器。在生物科学中尤其如此,因为生物数据难以生成且容易出错。通过实证结果,本文显示了使用稀疏的生物学训练数据集构建准确分类器的挑战。我们的发现表明了众所周知的分类技术的不足。尽管某些聚类技术(例如种子k均值)显示出一定的前景,但仍有进一步改进的空间。另外,我们提出了一种新颖的想法,当训练数据样本非常有限时,可以用于产生更平衡的分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号