首页> 外文OA文献 >ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
【2h】

ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines

机译:厄瓜多尔 - 轻松策划Angiosperm重复的细胞细胞区域,一种用于从下一代测序管道组装的清洗和凝固塑料的工具

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background With the rapid increase in availability of genomic resources offered by Next-Generation Sequencing (NGS) and the availability of free online genomic databases, efficient and standardized metadata curation approaches have become increasingly critical for the post-processing stages of biological data. Especially in organelle-based studies using circular chloroplast genome datasets, the assembly of the main structural regions in random order and orientation represents a major limitation in our ability to easily generate “ready-to-align” datasets for phylogenetic reconstruction, at both small and large taxonomic scales. In addition, current practices discard the most variable regions of the genomes to facilitate the alignment of the remaining coding regions. Nevertheless, no software is currently available to perform curation to such a degree, through simple detection, organization and positioning of the main plastome regions, making it a time-consuming and error-prone process. Here we introduce a fast and user friendly software ECuADOR, a Perl script specifically designed to automate the detection and reorganization of newly assembled plastomes obtained from any source available (NGS, sanger sequencing or assembler output). Methods ECuADOR uses a sliding-window approach to detect long repeated sequences in draft sequences, which then identifies the inverted repeat regions (IRs), even in case of artifactual breaks or sequencing errors and automates the rearrangement of the sequence to the widely used LSC–Irb–SSC–IRa order. This facilitates rapid post-editing steps such as creation of genome alignments, detection of variable regions, SNP detection and phylogenomic analyses. Results ECuADOR was successfully tested on plant families throughout the angiosperm phylogeny by curating 161 chloroplast datasets. ECuADOR first identified and reordered the central regions (LSC–Irb–SSC–IRa) for each dataset and then produced a new annotation for the chloroplast sequences. The process took less than 20 min with a maximum memory requirement of 150 MB and an accuracy of over 99%. Conclusions ECuADOR is the sole de novo one-step recognition and re-ordination tool that provides facilitation in the post-processing analysis of the extra nuclear genomes from NGS data. The program is available at https://github.com/BiodivGenomic/ECuADOR/.
机译:背景技术随着下一代测序(NGS)提供的基因组资源的快速增加以及免费在线基因组数据库的可用性,高效和标准化的元数据策策方法对生物数据的后处理阶段越来越重要。特别是在使用圆形叶绿体基因组数据集的细胞器的研究中,主结构区域以随机顺序和取向的组装代表了我们在小和均可轻松生成“准备对齐”数据集的能力,在小和大分类标准。此外,当前的实践丢弃基因组的最可变区域,以便于剩余编码区的对准。尽管如此,目前没有软件可以通过主塑造区域的简单检测,组织和定位来执行策策,使其进行耗时且易于出错的过程。在这里,我们介绍了一个快速和用户友好的软件厄瓜多尔,一个专门设计用于自动化从任何可用的任何源(NGS,Sanger测序或汇编器输出)获得的新组装塑料的检测和重组的Perl脚本。方法厄瓜多尔使用滑动窗口方法来检测草图序列中的长重复序列,然后识别反转的重复区域(IRS),即使在艺术破坏或排序误差的情况下,并使序列的重新排列到广泛使用的LSC- IRB-SSC-IRA订单。这有助于快速编辑步骤,例如创建基因组对准,检测可变区,SNP检测和系统托基分析。结果厄瓜多尔通过凝固161个叶绿体数据集来成功地在植物家族上进行植物家庭进行测试。厄瓜多尔首先识别并重新排序每个数据集的中央区域(LSC-IRB-SSC-IRA),然后为叶绿体序列产生新的注释。该过程小于20分钟,最大内存要求为150 MB,精度超过99%。结论厄瓜多尔是唯一的Novo一步识别和重新安排工具,可在NGS数据的额外核基因组的后处理分析中提供便利。该程序可在https://github.com/biodivgenomic/ecuador/处获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号