首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Intra-Cluster Distance Minimization in DNA Methylation Analysis Using an Advanced Tabu-Based Iterative kk-Medoids Clustering Algorithm (T-CLUST)
【24h】

Intra-Cluster Distance Minimization in DNA Methylation Analysis Using an Advanced Tabu-Based Iterative kk-Medoids Clustering Algorithm (T-CLUST)

机译:使用先进的禁忌迭代KK-METERINGS聚类算法(T-CLUST)中的簇内距离最小化DNA甲基化分析中的最小化

获取原文
获取原文并翻译 | 示例

摘要

Recent advances in DNA methylation profiling have paved the way for understanding the underlying epigenetic mechanisms of various diseases such as cancer. While conventional distance-based clustering algorithms (e.g., hierarchical and k-means clustering) have been heavily used in such profiling owing to their speed in conduct of high-throughput analysis, these methods commonly converge to suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from their speed. In this study, we introduce three related algorithms for a complete high-throughput methylation analysis: a variance-based dimension reduction algorithm to handle high-dimensionality in data, an outlier detection algorithm to identify the outliers of data, and an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of conventional k-medoids algorithm. The performance of the proposed algorithms is demonstrated on nine different real DNA methylation datasets obtained from the Gene Expression Omnibus DataSets database. The accuracy of the cluster identification obtained by our proposed algorithms is higher than those of hierarchical and k-means clustering, as well as the conventional methods. The algorithms are implemented in MATLAB, and available at: http://www.coe.miami.edu/simlab/ tclust.html.
机译:DNA甲基化分析的最新进展已经为理解癌症等各种疾病的潜在表观遗传机制铺平了道路。虽然常规距离的聚类算法(例如,等级和k均值聚类)由于其在进行高通量分析的速度而大量使用,但是这些方法通常会收敛到次优的解决方案和/或琐碎的群体他们贪婪的搜索性质。因此,需要方法来提高这些算法形成的簇的质量而不会从它们的速度牺牲。在这项研究中,我们介绍了三种相关算法,用于完整的高通量甲基化分析:基于方差的维度缩小算法,用于处理数据中的高维性,一个异常值检测算法,以识别数据的异常值,以及基于高级禁忌的曲折迭代k-yemoids聚类算法(T-Clust),以减少初始解对常规k麦考斯算法性能的影响。在从基因表达式omnibus数据集数据库获得的九种不同实际DNA甲基化数据集上证明了所提出的算法的性能。我们所提出的算法获得的群集识别的准确性高于分层和K-Means聚类,以及传统方法。算法在MATLAB中实现,可用于:http://www.coe.miami.edu/simlab/ tclust.html。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号