首页> 外文学位 >Bioinformatics Analyses for Next-Generation Sequencing of Plasma DNA.
【24h】

Bioinformatics Analyses for Next-Generation Sequencing of Plasma DNA.

机译:用于下一代血浆DNA测序的生物信息学分析。

获取原文
获取原文并翻译 | 示例

摘要

The presence of fetal DNA in the cell-free plasma of pregnant women was first described in 1997. The initial clinical applications of this phenomenon focused on the detection of paternally inherited traits such as sex and rhesus D blood group status. The development of massively parallel sequencing technologies has allowed more sophisticated analyses on circulating cell-free DNA in maternal plasma. For example, through the determination of the proportional representation of chromosome 21 sequences in maternal plasma, noninvasive prenatal diagnosis of fetal Down syndrome can be achieved with an accuracy of >98%. In the first part of my thesis, I have developed bioinformatics algorithms to perform genome-wide construction of the fetal genetic map from the massively parallel sequencing data of the maternal plasma DNA sample of a pregnant woman. The construction of the fetal genetic map through the maternal plasma sequencing data is very challenging because fetal DNA only constitutes approximately 10% of the maternal plasma DNA. Moreover, as the fetal DNA in maternal plasma exists as short fragments of less than 200 bp, existing bioinformatics techniques for genome construction are not applicable for this purpose. For the construction of the genome-wide fetal genetic map, I have used the genome of the father and the mother as scaffolds and calculated the fractional fetal DNA concentration. First, I looked at the paternal specific sequences in maternal plasma to determine which portions of the father's genome had been passed on to the fetus. For the determination of the maternal inheritance, I have developed the Relative Haplotype Dosage (RHDO) approach. This method is based on the principle that the portion of maternal genome inherited by the fetus would be present in slightly higher concentration in the maternal plasma. The use of haplotype information can enhance the efficacy of using the sequencing data. Thus, the maternal inheritance can be determined with a much lower sequencing depth than just looking at individual loci in the genome. This algorithm makes it feasible to use genome-wide scanning to diagnose fetal genetic disorders prenatally in a noninvasive way.;As the emergence of targeted massively parallel sequencing, the sequencing cost per base is reducing dramatically. Even though the first part of the thesis has already developed a method to estimate fractional fetal DNA concentration using parental genotype informations, it still cannot be used to deduce the fractional fetal DNA concentration directly from sequencing data without prior knowledge of genotype information. In the second part of this thesis, I propose a statistical mixture model based method, FetalQuant, which utilizes the maximum likelihood to estimate the fractional fetal DNA concentration directly from targeted massively parallel sequencing of maternal plasma DNA. This method allows fetal DNA concentration estimation superior to the existing methods in term of obviating the need of genotype information without loss of accuracy. Furthermore, by using Bayes' rule, this method can distinguish the informative SNPs where mother is homozygous and fetus is heterozygous, which is potential to detect dominant inherited disorder.;Besides the genetic analysis at the DNA level, epigenetic markers are also valuable for noninvasive diagnosis development. In the third part of this thesis, I have also developed a bioinformatics algorithm to efficiently analyze genomewide DNA methylation status based on the massively parallel sequencing of bisulfite-converted DNA. DNA methylation is one of the most important mechanisms for regulating gene expression. The study of DNA methylation for different genes is important for the understanding of the different physiological and pathological processes. Currently, the most popular method for analyzing DNA methylation status is through bisulfite sequencing. The principle of this method is based on the fact that unmethylated cytosine residues would be chemically converted to uracil on bisulfite treatment whereas methylated cytosine would remain unchanged. The converted uracil and unconverted cytosine can then be discriminated on sequencing. With the emergence of massively parallel sequencing platforms, it is possible to perform this bisulfite sequencing analysis on a genome-wide scale. However, the bioinformatics analysis of the genome-wide bisulfite sequencing data is much more complicated than analyzing the data from individual loci. Thus, I have developed Methyl-Pipe, a bioinformatics program for analyzing the DNA methylation status of genome-wide methylation status of DNA samples based on massively parallel sequencing. In the first step of this algorithm, an in-silico converted reference genome is produced by converting all the cytosine residues to thymine residues. Then, the sequenced reads of bisulfite-converted DNA sequences are aligned to this modified reference sequence. Finally, post-processing of the alignments removes non-unique and low-quality mappings and characterizes the methylation pattern in genome-wide manner. Making use of this new program, potential fetal-specific hypomethylated regions which can be used as blood biomarkers can be identified in a genome-wide manner.
机译:1997年首次描述了孕妇无细胞血浆中胎儿DNA的存在。这种现象的最初临床应用集中在检测父亲遗传的特征,例如性别和恒河猴D血型状况。大规模并行测序技术的发展已使对母体血浆中循环的无细胞DNA的分析更加复杂。例如,通过确定母体血浆中21号染色体序列的比例表示,可以以> 98%的准确度实现胎儿唐氏综合症的无创产前诊断。在论文的第一部分,我开发了一种生物信息学算法,可以从孕妇的孕妇血浆DNA样品的大规模并行测序数据中进行胎儿遗传图谱的全基因组构建。通过母体血浆测序数据构建胎儿遗传图谱非常具有挑战性,因为胎儿DNA仅约占母体血浆DNA的10%。此外,由于母体血浆中的胎儿DNA以小于200 bp的短片段形式存在,因此用于基因组构建的现有生物信息学技术不适用于该目的。为了构建全基因组的胎儿遗传图谱,我将父亲和母亲的基因组用作支架,并计算了胎儿DNA的分数浓度。首先,我查看了母体血浆中的父本特定序列,以确定父亲基因组的哪些部分已传递给胎儿。为了确定母体遗传,我开发了相对单倍型剂量(RHDO)方法。该方法基于以下原理:胎儿遗传的母体基因组部分在母体血浆中的浓度略高。单倍型信息的使用可以增强使用测序数据的功效。因此,与仅查看基因组中的单个基因座相比,可以以低得多的测序深度来确定母体遗传。该算法使得利用全基因组扫描以无创方式在产前诊断胎儿遗传疾病成为可能。随着靶向大规模并行测序的出现,每碱基的测序成本正在大大降低。即使论文的第一部分已经开发了一种使用亲本基因型信息估算胎儿DNA浓度的方法,但是如果没有先验的基因型信息,它仍然不能直接用于从测序数据中推导胎儿DNA浓度。在本文的第二部分中,我提出了一种基于统计混合模型的方法FetalQuant,该方法利用最大可能性直接根据目标血浆母体DNA的大规模平行测序来估算胎儿DNA的分数浓度。该方法在消除基因型信息需求的同时,可以使胎儿DNA浓度估算优于现有方法,而不会降低准确性。此外,通过使用贝叶斯定律,该方法可以区分信息丰富的SNP,其中母亲是纯合子而胎儿是杂合子,这有可能检测到显性遗传性疾病。诊断发展。在论文的第三部分中,我还开发了一种生物信息学算法,可基于亚硫酸氢盐转化的DNA的大规模并行测序来有效分析全基因组DNA甲基化状态。 DNA甲基化是调节基因表达的最重要机制之一。对不同基因的DNA甲基化的研究对于理解不同的生理和病理过程非常重要。当前,最流行的分析DNA甲基化状态的方法是通过亚硫酸氢盐测序。该方法的原理是基于以下事实:在亚硫酸氢盐处理时,未甲基化的胞嘧啶残基会化学转化为尿嘧啶,而甲基化的胞嘧啶则保持不变。然后可以在测序时区分转化的尿嘧啶和未转化的胞嘧啶。随着大规模并行测序平台的出现,有可能在全基因组范围内进行亚硫酸氢盐测序分析。但是,全基因组亚硫酸氢盐测序数据的生物信息学分析要比分析单个基因座的数据复杂得多。因此,我开发了Methyl-Pipe,这是一种生物信息学程序,用于基于大规模并行测序分析DNA样品的全基因组甲基化状态的DNA甲基化状态。在该算法的第一步中,通过将所有胞嘧啶残基转化为胸腺嘧啶残基,可生成计算机内转化的参考基因组。然后,将亚硫酸氢盐转化的DNA序列的测序读段与该修饰的参考序列进行比对。最后,比对的后处理可消除非唯一和低质量的图谱,并以全基因组的方式表征甲基化模式。利用这一新程序,可以以全基因组的方式识别可能用作血液生物标志物的潜在胎儿特异性低甲基化区域。

著录项

  • 作者

    Jiang, Peiyong.;

  • 作者单位

    The Chinese University of Hong Kong (Hong Kong).;

  • 授予单位 The Chinese University of Hong Kong (Hong Kong).;
  • 学科 Health Sciences Pathology.;Biology Biostatistics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号