首页> 外文学位 >Computational prediction of essential genes, and other applications of bioinformatics to genome annotation.
【24h】

Computational prediction of essential genes, and other applications of bioinformatics to genome annotation.

机译:基本基因的计算预测,以及生物信息学在基因组注释中的其他应用。

获取原文
获取原文并翻译 | 示例

摘要

The large-scale identification and characterization of genes is an important challenge. Hundreds of genomes have now been sequenced; the next step is discerning which regions encode functional products. This is often achieved with a mix of computational and experimental techniques. Three such techniques---prediction of essential genes, largescale transposon mutagenesis, and tiling microarrays---are the focus of the bioinformatics research presented here.;Essential genes are necessary for basic survival: disruption of even one is lethal to an organism. The ability to identify such genes in pathogens is understandably useful for drug design. Predicting essential genes in silico is particularly appealing because it circumvents expensive and difficult experimental screens. To date, most such prediction has concentrated on homology comparison to other species. This thesis presents a bioinformatics approach that employs characteristic features of a gene's sequence to estimate essentiality, and offers a promising way to identify antimicrobial drug targets in unstudied organisms.;A machine-learning classifier was trained on known essential genes in the model yeast Saccharomyces cerevisiae, and applied to the closely-related but relatively unstudied yeast Saccharomyces mikatae. The resulting predictions aligned well with homology-based estimates, and a subset was verified with in vivo knockouts in S. mikatae..;Next, the question of feature choice was addressed. Given an unstudied pathogen and the goal of identifying essential genes, are functional genomics assays worth performing, or will sequence data suffice? Three different feature classes (sequence-based, sequence-derived, and experimental data) were assessed alone and in combination with a simple machine learner. The amalgamated feature set recovered the highest rate of true-positive predictions, whereas functional genomics data alone returned the highest ratio of true positives to false positives. The results suggest that experimental data is indeed valuable; but if unavailable, complementary sequence features perform nearly as well.;Also presented here are bioinformatics approaches to characterize transposon insertion bias on a genomic scale, and optimize the performance of whole-genome tiling microarrays through the inclusion of mismatch oligonucleotides.;Together, these studies present an effective method to identify essential genes, and demonstrate the applicability of bioinformatics techniques to current issues in genome annotation.
机译:基因的大规模鉴定和表征是一个重要的挑战。现在已经对数百个基因组进行了测序。下一步是确定哪些区域编码功能产物。这通常是通过计算和实验技术的结合来实现的。三种技术-基本基因的预测,大规模转座子诱变和平铺微阵列-是此处提出的生物信息学研究的重点;必需基因对于基本生存是必不可少的:即使破坏一个基因也对生物体具有致命性。在病原体中鉴定此类基因的能力可用于药物设计,这是可以理解的。在计算机中预测必需基因特别吸引人,因为它规避了昂贵且困难的实验筛选。迄今为止,大多数此类预测都集中在与其他物种的同源性比较上。本论文提出了一种利用基因序列特征来估算必需性的生物信息学方法,为在未研究的生物体中鉴定抗微生物药物靶标提供了一种有前途的方法。机器学习分类器对啤酒酵母模型中已知的必需基因进行了训练,并应用于密切相关但相对未研究的酵母米卡酵母。所得的预测与基于同源性的估计很好地吻合,并且用体内的S. mikatae基因敲除验证了一个子集。接下来,解决了特征选择的问题。给定尚未研究的病原体和鉴定必不可少的基因的目标,功能基因组学测定值得吗?还是测序数据足够?分别评估了三种不同的要素类(基于序列的,基于序列的和实验数据),并结合了简单的机器学习器进行了评估。合并的特征集恢复了最高的真实阳性预测率,而仅功能基因组学数据返回的真实阳性与错误阳性率最高。结果表明实验数据确实有价值。但是,如果无法获得,则互补序列的功能也差不多。研究提供了一种识别必需基因的有效方法,并证明了生物信息学技术对基因组注释中当前问题的适用性。

著录项

  • 作者

    Seringhaus, Michael Rolf.;

  • 作者单位

    Yale University.;

  • 授予单位 Yale University.;
  • 学科 Bioinformatics.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 208 p.
  • 总页数 208
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号