iSeg: an efficient algorithm for segmentation of genomic and epigenomic data

Senthil B. Girimurugan; Yuhang Liu; Pei-Yau Lung; Daniel L. Vera; Jonathan H. Dennis; Hank W. Bass; Jinfeng Zhang

首页> 外文期刊>BMC Bioinformatics >iSeg: an efficient algorithm for segmentation of genomic and epigenomic data

【24h】

iSeg: an efficient algorithm for segmentation of genomic and epigenomic data

机译：iSeg：一种有效的基因组和表观基因组数据分割算法

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with improved accuracy and speed are still needed to effectively tackle both existing and emerging genomic and epigenomic segmentation problems. We designed an efficient algorithm, called iSeg, for segmentation of genomic and epigenomic profiles. iSeg first utilizes dynamic programming to identify candidate segments and test for significance. It then uses a novel data structure based on two coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Refinement and merging of significant segments are performed at the end to generate the final set of segments. By using an objective function based on the p-values of the segments, the algorithm can serve as a general computational framework to be combined with different assumptions on the distributions of the data. As a general segmentation method, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, nuclease sensitivity, and differential nuclease sensitivity data. Using simple t-tests to compute p-values across multiple datasets of different types, we evaluate iSeg using both simulated and experimental datasets and show that it performs satisfactorily when compared with some other popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is also very computationally efficient, well suited for large numbers of input profiles and data with very long sequences. We have developed an efficient general-purpose segmentation tool and showed that it had comparable or more accurate results than many of the most popular segment-calling algorithms used in contemporary genomic data analysis. iSeg is capable of analyzing datasets that have both positive and negative values. Tunable parameters allow users to readily adjust the statistical stringency to best match the biological nature of individual datasets, including widely or sparsely mapped genomic datasets or those with non-normal distributions.

机译：基因组功能元件的鉴定通常需要将沿基因组的一系列测量结果划分为多个片段，其中相邻片段具有不同的属性，例如不同的平均值。尽管开发了许多算法来解决基因组学研究中的这一问题，但仍需要具有提高的准确性和速度的方法来有效解决现有的和新兴的基因组和表观基因组分割问题。我们设计了一种有效的算法，称为iSeg，用于分割基因组和表观基因组图谱。 iSeg首先利用动态编程来识别候选片段并测试其重要性。然后，它使用基于两个耦合的平衡二叉树的新颖数据结构来检测重叠的重要段，并在搜索和优化阶段同时更新它们。最后对重要段进行细化和合并以生成最终的段集。通过使用基于段的p值的目标函数，该算法可以用作将与数据分布的不同假设结合起来的通用计算框架。作为一般的分割方法，它可以分割不同类型的基因组和表观基因组数据，例如DNA拷贝数变异，核小体占有率，核酸酶敏感性和差异核酸酶敏感性数据。通过使用简单的t检验来计算不同类型的多个数据集之间的p值，我们使用模拟和实验数据集对iSeg进行评估，并表明与其他常用方法（通常采用更复杂的统计模型）相比，它的性能令人满意。 iSeg以C ++实现，在计算上也非常高效，非常适合大量输入配置文件和序列很长的数据。我们已经开发了一种有效的通用分割工具，并显示出与当代基因组数据分析中使用的许多最流行的片段调用算法相比，其结果具有可比性或更准确。 iSeg能够分析具有正值和负值的数据集。可调参数使用户可以轻松地调整统计严格度，以最匹配单个数据集的生物学性质，包括广泛或稀疏映射的基因组数据集或具有非正态分布的数据集。

著录项

来源
《BMC Bioinformatics》 |2018年第1期|共页
作者
Senthil B. Girimurugan; Yuhang Liu; Pei-Yau Lung; Daniel L. Vera; Jonathan H. Dennis; Hank W. Bass; Jinfeng Zhang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge [J] . Sun Yue, Gao Kun, Wu Zhengwang, IEEE Transactions on Medical Imaging . 2021,第5期

机译：多站点婴儿脑细分算法：ISEG-2019挑战
2. Benchmark on Automatic Six-Month-Old Infant Brain Segmentation Algorithms: The iSeg-2017 Challenge [J] . Li Wang, Dong Nie, Guannan Li, IEEE Transactions on Medical Imaging . 2019,第9期

机译：六个月大的婴儿自动脑分割算法基准测试：iSeg-2017挑战
3. The epiGenomic Efficient Correlator (epiGeEC) tool allows fast comparison of user datasets with thousands of public epigenomic datasets [J] . Bioinformatics . 2019,第4期

机译：表观脑高效相关器（EPIGEEC）工具允许快速比较有数千个公共表观群数据集的用户数据集
4. Efficient filtering algorithm for detection of genetic similarity between large genomic datasets [C] . Viachaslau Tsyvina, David Campo, Seth Sims, IEEE International Conference on Computational Advances in Bio and Medical Sciences . 2017

机译：用于检测大型基因组数据集之间遗传相似性的高效过滤算法
5. Efficient algorithms for large data sets of genomic sequences in microbial community analysis. [D] . Knox, David A. 2010

机译：微生物群落分析中基因组序列大数据集的高效算法。
6. iSeg: an efficient algorithm for segmentation of genomic and epigenomic data [O] . Senthil B. Girimurugan, Yuhang Liu, Pei-Yau Lung, 2018

机译：iSeg：高效的基因组和表观基因组数据分割算法
7. iSeg: an efficient algorithm for segmentation of genomic and epigenomic data [O] . S.B. Girimurugan, Yuhang Liu, Pei-Yau Lung, 2017

机译：ISEG：一种有效的基因组和表观胶质数据分割算法

iSeg: an efficient algorithm for segmentation of genomic and epigenomic data

摘要

著录项

相似文献

相关主题

期刊订阅