首页> 外文会议>Workshop of the BioLink Special Interest Group on Linking Literature,Information and Knowledge for Biology >Toward Computer-Assisted Text Curation: Classification Is Easy (Choosing Training Data Can Be Hard...)
【24h】

Toward Computer-Assisted Text Curation: Classification Is Easy (Choosing Training Data Can Be Hard...)

机译:对计算机辅助文本策策:分类很容易(选择培训数据可能很难......)

获取原文

摘要

We aim to design a system for classifying scientific articles based on the presence of protein characterization experiments, intending to aid the curators populating JCVI's Characterized Protein (CHAR) Database of experimentally characterized protein s. We trained two classifiers using small datasets labeled by CHAR curators, and another classifier based on a much larger dataset using annotations from public databases. Performance varied greatly, in ways we did not anticipate. We describe the datasets, the classification method, and discuss the unexpected results.
机译:我们的目的是根据存在蛋白质表征实验的存在,设计一种分类科学制品的系统,意图帮助填充实验表征蛋白质S的JCVI所表征蛋白质(Char)数据库的助助剂。我们使用CHAR策展人标记的小型数据集培训了两个分类器,以及另一个基于来自公共数据库的注释的更大数据集的分类器。性能很大,我们没有预料到。我们描述了数据集,分类方法,并讨论了意外结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号