首页> 外文学位 >A data mining study of G-quadruplexes and their effect on DNA replication.
【24h】

A data mining study of G-quadruplexes and their effect on DNA replication.

机译:G-四链体及其对DNA复制的影响的数据挖掘研究。

获取原文
获取原文并翻译 | 示例

摘要

G-quadruplexes are guanine rich sequences of DNA that can form non-Watson-Crick four stranded structures. They have been found to exist in various regions of the genome and are believed to play a biological role. We hypothesize that the presence of these structures poses a barrier to DNA replication by standard DNA polymerases and thus requires the intervention of alternative robust but error-prone polymerases for the completion of DNA replication. To test this hypothesis in silico, we assumed that the presence of error-prone replication could be inferred by studying the degree of variation at these sites. We analyzed the density of single nucleotide polymorphisms in the neighborhood of potential G-quadruplex sequences in the human genome. The analysis shows a significantly higher density of single nucleotide polymorphisms within G-quadruplexes. Further, there is evidence of a directional bias in the extent of error, seen as an asymmetry in the incidence of single nucleotide polymorphisms on either side of quadruplexes. Taken together, the evidence favors the hypothesis that G-quadruplexes have a deleterious effect on the fidelity of DNA replication.;A secondary research goal of the thesis is to reduce the number of false positives in the prediction of G-quadruplexes based only on sequence information. Most current algorithms are regular expression searches based on sequences that have shown potential to form G-quadruplexes. Using the results from our investigation on sequence variation, predicted melting temperature and machine learning models, attributes derived solely from the sequences were analyzed to determine if classification can be accurately performed. We conclude that factors external to the sequence may be important in determining if and when G-quadruplexes form.
机译:G-四链体是富含鸟嘌呤的DNA序列,可以形成非Watson-Crick四链结构。已经发现它们存在于基因组的各个区域中,并被认为具有生物学作用。我们假设这些结构的存在对标准DNA聚合酶的DNA复制构成了障碍,因此需要对其他健壮但容易出错的聚合酶进行干预才能完成DNA复制。为了在计算机上测试该假设,我们假设可以通过研究这些位点的变异程度来推断容易出错的复制的存在。我们分析了人类基因组中潜在G-四链体序列附近单核苷酸多态性的密度。分析表明,G-四链体中单核苷酸多态性的密度明显更高。此外,有证据表明错误程度存在方向性偏差,这被视为四链体两侧的单核苷酸多态性发生率的不对称性。综上所述,证据支持以下假设:G-四链体对DNA复制的保真度具有有害作用。;论文的第二个研究目标是减少仅基于序列的G-四链体预测中的假阳性数。信息。当前大多数算法是基于已显示出形成G四联体潜力的序列的正则表达式搜索。使用我们对序列变异,预测的解链温度和机器学习模型的研究结果,分析了仅从序列中得出的属性,以确定分类是否可以准确进行。我们得出结论,序列外部因素可能对确定是否以及何时形成G四联体很重要。

著录项

  • 作者

    Nichols, Gregory Shannon.;

  • 作者单位

    University of Missouri - Kansas City.;

  • 授予单位 University of Missouri - Kansas City.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 M.S.
  • 年度 2012
  • 页码 55 p.
  • 总页数 55
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号