首页> 外文期刊>Database >Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
【24h】

Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database

机译:使用二进制分类为比较毒物基因组学数据库确定文章的优先级并进行整理

获取原文
           

摘要

We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.
机译:我们报告了自动文本分类管道(所谓的ToxiCat基因分类器)的原始集成,该管道的开发是为了执行生物医学文档分类和优先级划分,以便加快比较毒物基因组数据库(CTD)的管理。基本上可以将该任务描述为二进制分类任务,其中使用评分功能对选定的一组文章进行排名。然后,使用问答系统的组件从排名的文章列表中提取特定于CTD的注释。使用支持向量机生成排序函数,该支持向量机结合了三个主要模块:MEDLINE的信息检索引擎(EAGLi),为先前的BioCreative活动开发的基因归一化服务(NormaGene),最后是一组回答组件和实体疾病和化学物质识别器。管道的主要组件可作为Web应用程序和Web服务公开使用。可通过Web用户界面(http://pingu.unige.ch:8080/Toxicat)获得针对BioCreative竞赛进行的特定集成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号