...
首页> 外文期刊>BMC Bioinformatics >iSeg: an efficient algorithm for segmentation of genomic and epigenomic data
【24h】

iSeg: an efficient algorithm for segmentation of genomic and epigenomic data

机译:iSeg:一种有效的基因组和表观基因组数据分割算法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with improved accuracy and speed are still needed to effectively tackle both existing and emerging genomic and epigenomic segmentation problems. We designed an efficient algorithm, called iSeg, for segmentation of genomic and epigenomic profiles. iSeg first utilizes dynamic programming to identify candidate segments and test for significance. It then uses a novel data structure based on two coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Refinement and merging of significant segments are performed at the end to generate the final set of segments. By using an objective function based on the p-values of the segments, the algorithm can serve as a general computational framework to be combined with different assumptions on the distributions of the data. As a general segmentation method, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, nuclease sensitivity, and differential nuclease sensitivity data. Using simple t-tests to compute p-values across multiple datasets of different types, we evaluate iSeg using both simulated and experimental datasets and show that it performs satisfactorily when compared with some other popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is also very computationally efficient, well suited for large numbers of input profiles and data with very long sequences. We have developed an efficient general-purpose segmentation tool and showed that it had comparable or more accurate results than many of the most popular segment-calling algorithms used in contemporary genomic data analysis. iSeg is capable of analyzing datasets that have both positive and negative values. Tunable parameters allow users to readily adjust the statistical stringency to best match the biological nature of individual datasets, including widely or sparsely mapped genomic datasets or those with non-normal distributions.
机译:基因组功能元件的鉴定通常需要将沿基因组的一系列测量结果划分为多个片段,其中相邻片段具有不同的属性,例如不同的平均值。尽管开发了许多算法来解决基因组学研究中的这一问题,但仍需要具有提高的准确性和速度的方法来有效解决现有的和新兴的基因组和表观基因组分割问题。我们设计了一种有效的算法,称为iSeg,用于分割基因组和表观基因组图谱。 iSeg首先利用动态编程来识别候选片段并测试其重要性。然后,它使用基于两个耦合的平衡二叉树的新颖数据结构来检测重叠的重要段,并在搜索和优化阶段同时更新它们。最后对重要段进行细化和合并以生成最终的段集。通过使用基于段的p值的目标函数,该算法可以用作将与数据分布的不同假设结合起来的通用计算框架。作为一般的分割方法,它可以分割不同类型的基因组和表观基因组数据,例如DNA拷贝数变异,核小体占有率,核酸酶敏感性和差异核酸酶敏感性数据。通过使用简单的t检验来计算不同类型的多个数据集之间的p值,我们使用模拟和实验数据集对iSeg进行评估,并表明与其他常用方法(通常采用更复杂的统计模型)相比,它的性能令人满意。 iSeg以C ++实现,在计算上也非常高效,非常适合大量输入配置文件和序列很长的数据。我们已经开发了一种有效的通用分割工具,并显示出与当代基因组数据分析中使用的许多最流行的片段调用算法相比,其结果具有可比性或更准确。 iSeg能够分析具有正值和负值的数据集。可调参数使用户可以轻松地调整统计严格度,以最匹配单个数据集的生物学性质,包括广泛或稀疏映射的基因组数据集或具有非正态分布的数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号