首页> 外文期刊>Bioinformatics >Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors
【24h】

Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors

机译:从文献中自动提取突变数据:MuteXt在G蛋白偶联受体和核激素受体上的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: The amount of genomic and proteomic data that is published daily in the scientific literature is outstripping the ability of experimental scientists to stay current. Reviews, the traditional medium for collating published observations, are also unable to keep pace. For some specific classes of information (e.g. sequences and protein structures), obligatory data deposition policies have helped. However, a great deal of other valuable information is spread throughout the literature hindering coherent access. We are involved in the Molecular Class-Specific Information System (MCSIS) project, a collaborative effort to design and automate the maintenance of protein family databases. The first two databases, the GPCRDB and NucleaRDB, are focused on G protein-coupled receptors (GPCRs) and nuclear hormone receptors (NRs), respectively. The main aim of the MCSIS project is to gather heterogeneous data from across a variety of electronic and literature sources in order to draw new inferences about the target protein families. Results: We present a computational method that identifies and extracts mutation data from the scientific literature. We focused on the extraction of single point mutations for the GPCR and NR superfamilies. After validation by plausibility filters, the mutation data is integrated into the corresponding MCSIS where it is combined with structural and sequence information already stored in these databases. We extracted and validated 2736 true point mutations from 914 articles on GPCRs and 785 true point mutations from 1094 articles on NRs. The current version of our automated extraction algorithm identifies 49.3% of the GPCR point mutations with a specificity of 87.9%, and 64.5% of the NR point mutations with a specificity of 85.8%. MuteXt routinely analyzes 100 electronic articles in approximately 1 h.
机译:动机:科学文献中每天发布的基因组和蛋白质组学数据的数量超过了实验科学家保持最新状态的能力。评论是整理发表的观察结果的传统媒介,也无法跟上步伐。对于某些特定类别的信息(例如序列和蛋白质结构),强制性的数据沉积策略已有所帮助。但是,大量其他有价值的信息散布在整个文献中,阻碍了连贯的访问。我们参与了分子类别特定信息系统(MCSIS)项目,这是一项旨在设计和自动化蛋白质家族数据库维护工作的合作项目。前两个数据库GPCRDB和NucleaRDB分别专注于G蛋白偶联受体(GPCR)和核激素受体(NRs)。 MCSIS项目的主要目的是从各种电子和文献资源中收集异构数据,以便得出有关靶蛋白家族的新推论。结果:我们提出了一种计算方法,可以从科学文献中识别并提取突变数据。我们专注于GPCR和NR超家族的单点突变的提取。在通过合理性过滤器进行验证后,将突变数据整合到相应的MCSIS中,在此将其与已经存储在这些数据库中的结构和序列信息进行组合。我们从GPCR的914篇文章中提取了2736个真点突变,并对其进行了验证,从NR的1094篇文章中提取了785个真点突变。当前版本的自动提取算法可识别出49.3%的GPCR点突变(特异性为87.9%)和64.5%的NR点突变,特异性为85.8%。 MuteXt通常在大约1小时内分析100篇电子文章。

著录项

  • 来源
    《Bioinformatics》 |2004年第4期|p. 557-568|共12页
  • 作者单位

    Department of Cellular and Molecular Pharmacology, University of California of San Francisco, Genentech Hall, Box 2240, 600 16th Street, San Francisco, CA 94143, USA;

    Department of Cellular and Molecular Pharmacology, University of California of San Francisco, Genentech Hall, Box 2240, 600 16th Street, San Francisco, CA 94143, USA;

    Department of Cellular and Molecular Pharmacology, University of California of San Francisco, Genentech Hall, Box 2240, 600 16th Street, San Francisco, CA 94143, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物科学;生物工程学(生物技术);
  • 关键词

  • 入库时间 2022-08-17 23:50:16

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号