首页> 外文期刊>BMC Bioinformatics >A method for automatically extracting infectious disease-related primers and probes from the literature
【24h】

A method for automatically extracting infectious disease-related primers and probes from the literature

机译:一种从文献中自动提取与传染病相关的引物和探针的方法

获取原文
           

摘要

Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.
机译:背景引物和探针序列是基于核酸的检测系统的主要组成部分。生物学家使用引物和探针完成不同的任务,其中一些与传染病的诊断和处方有关。生物学文献是通过经验验证的引物和探针序列的主要信息来源。因此,研究人员导航这些重要信息变得越来越重要。在本文中,我们提出了一种从文献中提取和注释引物/探针序列的四阶段方法。这些阶段是:(1)将每个文档转换成纸质部分树;(2)使用一组基于有限状态机的识别器检测候选序列;(3)使用基于规则的专家系统细化问题序列;以及(4)用相关的生物/基因信息注释提取的序列。结果我们使用了由297份手稿组成的测试集测试了我们的方法。提取的序列及其生物/基因注释由分子生物学家小组手动评估。评估结果表明,我们的方法适用于自动提取DNA序列,其准确/召回率分别为97.98%和95.77%。此外,在检测到的序列中有76.66%正确标明了它们的生物名称。该系统还为分配了正确生物名称的序列的46.18%提供了正确的基因相关信息。结论我们认为,所提出的方法可以帮助生物医学研究人员使用分子方法诊断和开出不同传染病的常规任务。另外,可以扩展所提出的方法以从文献中检测和提取其他生物学序列。提取的信息还可用于轻松更新可用的引物/探针数据库或从头开始创建新数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号