首页> 外文期刊>Plant molecular biology reporter >Development of TBSPG Pipelines for Refining Unique Mapping and Repetitive Sequence Detection Using the Two Halves of Each Illumina Sequence Read
【24h】

Development of TBSPG Pipelines for Refining Unique Mapping and Repetitive Sequence Detection Using the Two Halves of Each Illumina Sequence Read

机译:使用每个Illumina序列的两个半部分读取的TBSPG管道,用于改进独特的映射和重复序列检测的开发

获取原文
获取原文并翻译 | 示例
           

摘要

We developed six pipelines (TBSPG) for mapping Illumina sequence reads to reference genomes, refining unique mapping, and computing the mapped read number and coverage. These pipelines provide the options of conducting multi-mapping or unique mapping, inputting with paired-end read files or a single-end read file, removing or not removing nucleus-organelle shared sequences, and mapping with the full-length reads or with the two halves of each read to refine the detection of unique and non-unique sequences. These TBSPG pipelines were based on (and named after) publicly available tools: Trimmomatic, the Burrows-Wheeler Aligner (BWA), SAMtools, Picard, and the Genome Analysis Toolkit (GATK). We developed several Perl scripts to fill the gaps between the tools, connect the tools, recognize half-length reads, select uniquely mapped reads, and compute and output data in a Microsoft Excel-recognizable format for studying the read number and the coverage per chromosome and organellar genome. In a potato 100-bp paired-end sequence file (Illumina TruSeq), approximately 6.75 % of uniquely mapped full-length reads were found to actually contain non-unique sequences at the half-length-read level. These freely available TBSPG pipelines can be used for many read-based applications, including repetitive sequence analysis and organellar genome copy number estimation.
机译:我们开发了6条管道(TBSPG),用于将Illumina序列读图映射到参考基因组,完善独特的映射以及计算映射的读数和覆盖范围。这些管道提供了进行多重映射或唯一映射,输入双末端读取文件或单末端读取文件,删除或不删除细胞核-细胞器共享序列,以及与全长读取或与每次读取可分为两半,以改善对唯一和非唯一序列的检测。这些TBSPG管道基于(并以其命名)公开可用的工具:Trimmomatic,Burrows-Wheeler Aligner(BWA),SAMtools,Picard和Genome Analysis Toolkit(GATK)。我们开发了几种Perl脚本来填补工具之间的空白,连接工具,识别半长读取,选择唯一映射的读取以及以Microsoft Excel可识别的格式计算和输出数据,以研究读取数和每个染色体的覆盖率和细胞器基因组。在马铃薯100 bp的配对末端序列文件(Illumina TruSeq)中,发现大约6.75%的唯一映射的全长读段实际上包含半长度读段的非唯一序列。这些免费提供的TBSPG管道可用于许多基于读取的应用程序,包括重复序列分析和细胞器基因组拷贝数估计。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号