...
首页> 外文期刊>Frontiers in Plant Science >Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
【24h】

Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants

机译:作物植物中全基因组亚硫酸氢盐测序数据的映射方法的性能

获取原文
           

摘要

DNA methylation is involved in many different biological processes in the development and well-being of crop plants such as transposon activation, heterosis, environment-dependent transcriptome plasticity, aging, and many diseases. Whole-genome bisulfite sequencing is an excellent technology for detecting and quantifying DNA methylation patterns in a wide variety of species, but optimized data analysis pipelines exist only for a small number of species and are missing for many important crop plants. This is especially important as most existing benchmark studies have been performed on mammals with hardly any repetitive elements and without CHG and CHH methylation. Pipelines for the analysis of whole-genome bisulfite sequencing data usually consists of four steps: read trimming, read mapping, quantification of methylation levels, and prediction of differentially methylated regions (DMRs). Here we focus on read mapping, which is challenging because un-methylated cytosines are transformed to uracil during bisulfite treatment and to thymine during the subsequent polymerase chain reaction, and read mappers must be capable of dealing with this cytosine/thymine polymorphism. Several read mappers have been developed over the last years, with different strengths and weaknesses, but their performances have not been critically evaluated. Here, we compare eight read mappers: Bismark, BismarkBwt2, BSMAP, BS-Seeker2, Bwameth, GEM3, Segemehl, and GSNAP to assess the impact of the read-mapping results on the prediction of DMRs. We used simulated data generated from the genomes of Arabidopsis thaliana , Brassica napus , Glycine max , Solanum tuberosum , and Zea mays , monitored the effects of the bisulfite conversion rate, the sequencing error rate, the maximum number of allowed mismatches, as well as the genome structure and size, and calculated precision, number of uniquely mapped reads, distribution of the mapped reads, run time, and memory consumption as features for benchmarking the eight read mappers mentioned above. Furthermore, we validated our findings using real-world data of Glycine max and showed the influence of the mapping step on DMR calling in WGBS pipelines. We found that the conversion rate had only a minor impact on the mapping quality and the number of uniquely mapped reads, whereas the error rate and the maximum number of allowed mismatches had a strong impact and leads to differences of the performance of the eight read mappers. In conclusion, we recommend BSMAP which needs the shortest run time and yields the highest precision, and Bismark which requires the smallest amount of memory and yields precision and high numbers of uniquely mapped reads.
机译:DNA甲基化涉及许多不同的生物过程在作物植物的发育和福祉中,例如转座子活化,杂种优势,依赖于依赖性转录组可塑性,老化和许多疾病。全基因组亚硫酸氢盐测序是用于在各种物种中检测和定量DNA甲基化模式的优异技术,但是优化的数据分析管道仅为少量物种存在,并且缺少许多重要的作物植物。这尤其重要,因为大多数现有的基准研究已经在哺乳动物上进行,几乎没有任何重复的元素和没有CHG和CHH甲基化。用于分析全基因组亚硫酸氢盐测序数据的管道通常由四个步骤组成:读取修剪,读取映射,甲基化水平的定量,以及差异甲基化区域的预测(DMRS)。在这里,我们专注于读取测绘,这是具有挑战性的,因为在亚硫酸氢盐处理期间与尿嘧啶转化为尿嘧啶并在随后的聚合酶链反应期间转化为胸腺,并且读取映射器必须能够处理这种胞嘧啶/胸腺嘧啶多态性。在过去几年中,几个读映射者已经开发出不同的优势和劣势,但它们的表现尚未受到严重评估。在这里,我们比较八个读取映射器:Bismark,Bismarkbwt2,BSMap,BS-Seeker2,BWameth,Gem3,Segememehl和GSNAP,以评估读取映射结果对DMRS预测的影响。我们使用了从拟南芥,甘蓝型油菜,甘氨酸Max,Solanum Tuberosum和Zea Mays的基因组产生的模拟数据,监测了亚硫酸氢盐转化率,测序误差率,允许的不匹配的最大数量的影响,以及基因组结构和大小,并计算精度,唯一映射的读数,映射的读取,运行时和内存消耗的分布为基准测试上述八个读取映射器的功能。此外,我们使用Glycine Max的真实数据验证了我们的研究结果,并显示了映射步骤对WGBS管道中的DMR调用的影响。我们发现,转换率对映射质量和唯一映射读数的数量仅产生了微小的影响,而错误率和允许的最大不匹配数具有强烈的影响,并导致八个读取映射器的性能的差异。 。总之,我们建议您需要最短的运行时间并产生最高精度,并且Bismark,这需要最小的内存,并产生精度和大量映射读取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号