首页> 外文期刊>BMC Bioinformatics >TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data
【24h】

TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data

机译:TAPDANCE:一种自动工具,可从下一代序列数据中识别和注释转座子插入CIS以及CIS之间的关联

获取原文
       

摘要

Background Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being generated. Previous approaches utilized to 1) map junction fragments within the genome and 2) identify Common Insertion Sites (CISs) within the genome are not practical due to the volume of data generated by current sequencing technologies. Previous approaches applied to this problem also required significant manual annotation. Results We describe Transposon Annotation Poisson Distribution Association Network Connectivity Environment (TAPDANCE) software, which automates the identification of CISs within transposon junction fragment insertion data. Starting with barcoded sequence data, the software identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied to assess and rank genomic regions showing significant enrichment for transposon insertion. Novel methods of counting insertions are used to ensure that the results presented have the expected characteristics of informative CISs. A persistent mySQL database is generated and utilized to keep track of sequences, mappings and common insertion sites. Additionally, associations between phenotypes and CISs are also identified using Fisher’s exact test with multiple testing correction. In a case study using previously published data we show that the TAPDANCE software identifies CISs as previously described, prioritizes them based on p-value, allows holistic visualization of the data within genome browser software and identifies relationships present in the structure of the data. Conclusions The TAPDANCE process is fully automated, performs similarly to previous labor intensive approaches, provides consistent results at a wide range of sequence sampling depth, has the capability of handling extremely large datasets, enables meaningful comparison across datasets and enables large scale meta-analyses of junction fragment data. The TAPDANCE software will greatly enhance our ability to analyze these datasets in order to increase our understanding of the genetic basis of cancers.
机译:背景技术应用于高通量正向遗传筛选中产生的转座子插入连接片段分析的下一代测序方法已经产生了对清晰的信息学和统计方法的需求,以处理当前正在生成的大量数据。由于当前测序技术所产生的数据量大,以前用于1)在基因组中定位连接片段的图谱和2)在基因组中识别共同插入位点(CIS)的方法不切实际。应用于此问题的先前方法也需要大量的手动注释。结果我们描述了转座子注释泊松分布协会网络连接环境(TAPDANCE)软件,该软件可自动识别转座子接合片段插入数据中的CIS。从带条形码的序列数据开始,该软件使用领结式短读映射器识别和修饰序列,并将推定的基因组序列映射到参考基因组。然后,将泊松分布统计数据应用于评估和排名显示转座子插入明显富集的基因组区域。使用新的插入计数方法来确保显示的结果具有信息CIS的预期特征。生成了一个永久的mySQL数据库,并将其用于跟踪序列,映射和公共插入位点。此外,还可以使用Fisher精确检验和多重检验校正来识别表型和CIS之间的关联。在使用先前发布的数据进行的案例研究中,我们表明TAPDANCE软件可以如前所述识别CIS,并根据p值对CIS进行优先级排序,允许在基因组浏览器软件中对数据进行整体可视化,并识别数据结构中存在的关系。结论TAPDANCE流程是全自动的,与以前的劳动密集型方法相似,可在广泛的序列采样深度上提供一致的结果,具有处理超大型数据集的能力,能够在各个数据集之间进行有意义的比较,并可以对数据集进行大规模的荟萃分析。连接片段数据。 TAPDANCE软件将大大增强我们分析这些数据集的能力,以增进我们对癌症遗传基础的了解。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号