首页> 外文学位 >Machine Learning Applications in Genomics, Protein Folding and Protein-Protein Interactions
【24h】

Machine Learning Applications in Genomics, Protein Folding and Protein-Protein Interactions

机译:机器学习在基因组学,蛋白质折叠和蛋白质-蛋白质相互作用中的应用

获取原文
获取原文并翻译 | 示例

摘要

The field of machine learning, which aims to develop computer algorithms that improve with experience, has widely assisted scientists in understanding of a vast and diverse array of biological phenomena in recent years. Through the analysis of large and complex datasets by efficient and intelligent algorithms, huge advancements have been made in understanding the biological processes taking place in the cell and the underlying causes of many diseases and abnormalities. Consequently the development of new drugs and treatments have become possible.;This thesis presents machine learning solutions for three biological problems. The first problem is focused on building models to predict the structural similarity of a docked protein complex to its native form. Using a set of physico-chemical features and evolutionary conservation, these models not only rank candidate complexes relative to each other, but also outperform the built-in scoring functions of the docking programs used to generate the complexes. The second problem studies how point mutation can impact the structure and consequently the stability of a protein by employing machine learning methods to predict the change in the free energy of the protein. This approach, which has the potential of providing insight on the effects of multiple mutations of amino acids besides single mutations, does not require costly calculations of energy functions that rely on atomic-level statistical mechanics and molecular energetics. In the third part of this work, a method to identify reads from paired-end sequencing data containing inter-chromosomal translocation or insertion breakpoints is proposed. The huge search space in this problem is examined by applying a distance-preserving embedding algorithm to solve the approximate nearest neighbor problem. Experimental validation and comparison with similar existing methods shows the advantages of this approach in detecting breakpoints efficiently and accurately.
机译:机器学习领域旨在开发随经验而改进的计算机算法,近年来广泛地帮助科学家理解了各种各样的生物现象。通过使用高效,智能的算法对大型和复杂的数据集进行分析,在理解细胞中发生的生物学过程以及许多疾病和异常的根本原因方面取得了巨大的进步。因此,新药和新疗法的开发成为可能。本论文提出了针对三种生物学问题的机器学习解决方案。第一个问题集中在构建模型上,以预测对接蛋白复合物与其天然形式的结构相似性。通过使用一组理化特征和进化保守性,这些模型不仅可以对候选复合物进行相对排名,而且还优于用于生成复合物的对接程序的内置评分功能。第二个问题是通过采用机器学习方法预测蛋白质自由能的变化,研究点突变如何影响蛋白质的结构并进而影响蛋白质的稳定性。这种方法具有提供除单个突变外的氨基酸多个突变影响的潜力,不需要依赖原子级统计力学和分子能量学的能量函数的昂贵计算。在这项工作的第三部分中,提出了一种从包含染色体间易位或插入断点的配对末端测序数据中识别读数的方法。通过应用距离保持嵌入算法解决近似最近邻问题,研究了该问题中巨大的搜索空间。实验验证和与类似现有方法的比较表明,该方法在有效且准确地检测断点方面具有优势。

著录项

  • 作者

    Farhoodi, Roshanak.;

  • 作者单位

    University of Massachusetts Boston.;

  • 授予单位 University of Massachusetts Boston.;
  • 学科 Computer science.;Artificial intelligence.;Genetics.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 159 p.
  • 总页数 159
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号