首页> 外文学位 >Extracting signal from noise in biological data: Evaluations and applications of text mining and sequence coevolution.
【24h】

Extracting signal from noise in biological data: Evaluations and applications of text mining and sequence coevolution.

机译:从生物数据中的噪声中提取信号:文本挖掘和序列协同进化的评估和应用。

获取原文
获取原文并翻译 | 示例

摘要

As the quantity of biological data continues to expand, it is the role of the computational biologist to develop new methods and tools to efficiently and accurately translate biological data into biological knowledge. Focusing on biomedical literature and biological sequences, this dissertation is about techniques for learning more from biological data.;The early chapters address biomedical text mining, and specifically the problem of automatically compiling data on protein point mutations from biomedical literature. Protein point mutations and substitutions are central in many areas of biomedical research, including human disease, biodiversity, and protein structure/function relationships. Mutation databases exist to centralize known information, but are frequently expensive to compile. An automated approach is presented for developing high-performance text mining systems and is applied to develop MutationFinder, a tool that scans text and extracts descriptions of point mutations into structured formats. Manual and automated approaches for annotating mutations are then compared, resulting in the conclusion that combining automatic and manual annotation tools may be the best approach to develop comprehensive and accurate biomedical databases.;The later chapters focus on identifying pairs of coevolving positions in proteins. Just as macroscopic structures like the bills of hummingbirds and the corolla tubes of flowering plants coevolve, it is expected that interacting positions within and between proteins also coevolve to maintain highly specific interactions. If true, coevolutionary signals should be detectable in multiple sequence alignments and may contain information on intramolecular or intermolecular interactions between amino acid residues. An analysis of coevolution algorithms leads to the surprising conclusion that algorithms that do not incorporate phylogeny can match the performance of those that do incorporate phylogeny. A coevolution algorithm is then applied to predict interactions between component proteins of the Type VI Secretion System (T655), leading to a new model of the T6SS.;Additional contributions of this work include two open-source software projects, MutationFinder and the PyCogent coevolution module. These high-quality software tools allow for the reproduction, application, and expansion of the work presented in this dissertation: text mining for point mutations, and detecting coevolution of biological sequences.
机译:随着生物学数据数量的不断增长,计算生物学家的职责是开发新的方法和工具,以有效,准确地将生物学数据转化为生物学知识。本论文着重于生物医学文献和生物学序列,是关于从生物数据中学习更多的技术。早期章节讨论了生物医学文本挖掘,特别是自动编辑生物医学文献中蛋白质点突变数据的问题。蛋白质点突变和替代在生物医学研究的许多领域都很重要,包括人类疾病,生物多样性和蛋白质结构/功能关系。存在突变数据库以集中已知信息,但是编译通常很昂贵。提出了一种用于开发高性能文本挖掘系统的自动化方法,并将其用于开发MutationFinder,该工具可扫描文本并将点突变的描述提取为结构化格式。然后比较了用于注释突变的手动方法和自动方法,得出的结论是,结合使用自动注释工具和手动注释工具可能是开发全面而准确的生物医学数据库的最佳方法。随后的章节着重于鉴定蛋白质中共同进化的位置对。就像蜂鸟的嘴和开花植物的花冠管这样的宏观结构共同进化时,人们期望蛋白质内部和蛋白质之间的相互作用位置也共同进化以维持高度特异性的相互作用。如果为真,则协同进化信号应可在多个序列比对中检测到,并且可能包含有关氨基酸残基之间分子内或分子间相互作用的信息。对协进化算法的分析得出了令人惊讶的结论,即没有合并系统发育的算法可以匹配那些合并系统发育的算法的性能。然后将协同进化算法应用于预测VI型分泌系统(T655)的组成蛋白之间的相互作用,从而开发出新的T6SS模型。该工作的其他贡献包括两个开源软件项目MutationFinder和PyCogent协同进化模块。这些高质量的软件工具可以复制,应用和扩展本论文中提出的工作:对点突变进行文本挖掘,以及检测生物序列的协同进化。

著录项

  • 作者

    Caporaso, J. Gregory.;

  • 作者单位

    University of Colorado at Denver.;

  • 授予单位 University of Colorado at Denver.;
  • 学科 Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 177 p.
  • 总页数 177
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号