首页> 外文期刊>Frontiers in Plant Science >De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences
【24h】

De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

机译:<斜视> de novo 从非模型物种的完全叶绿体基因组组装从基于k-mer频率的叶绿体中的叶绿体中的总DNA序列读取

获取原文
           

摘要

Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k -mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.
机译:整个基因组霰弹枪(WGS)植物物种序列通常含有衍生自叶绿体基因组的丰富读数。到目前为止,通常这些读数通常基于来自相关物种的叶绿体的同源性地鉴定并组装成叶绿体基因组。这种重新排序方法可以选择基因组之间的结构差,尤其是在未在其中测序的非模型物种。替代方法是从总基因组DNA序列中组装叶绿体基因组。在这项研究中,我们使用K-MER频率表来识别和提取来自WG的叶片读取并使用高度集成和自动定制的定制管道组装这些。我们的策略包括旨在优化组件和填充空白的步骤,这些步骤是由于WGS数据集中的覆盖范围而留下的。我们成功地组装了来自植物物种的三种完全叶绿体基因组,其中一系列核基因组尺寸,以证明我们的方法的普遍性:Solanum Lycopersicum(0.9 GB),Aegilops Tauschii(4 GB)和刺血吸虫(25 GB)。我们还突出了优化k的选择和所用数据量的需要。这种新的和经济高效的方法对于de novo短读组件将促进完全叶绿体基因组的研究,具有更准确的分析和推论,特别是在非模型植物基因组中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号