Clustering Algorithms Optimizer: A Framework for Large Datasets

机译：聚类算法优化器：大型数据集的框架

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (ⅰ) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a data-driven framework that includes two interrelated steps. The first one is SVD-based dimension reduction and the second is an automated tuning of the algorithm's parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.

机译：在许多生物信息学任务中都采用了聚类算法，包括蛋白质序列的分类和基因表达数据的分析。尽管通常应用这些算法，但是它们中的许多受到以下限制：（ⅰ）依赖于预定的参数调整，例如关于聚类数量的先验知识；（ii）涉及产生不确定结果的不确定性程序。因此，需要一种解决这些缺点的框架。我们提供了一个数据驱动的框架，其中包括两个相互关联的步骤。第一个是基于SVD的降维，第二个是对算法参数的自动调整。对于大型数据集，有效地调整了降维步骤。根据称为贝叶斯信息准则（BIC）的内部评估标准来确定最佳参数设置。该框架可以合并大多数聚类算法并提高其性能。在这项研究中，我们通过结合标准的K-Means和量子聚类算法来说明该平台的有效性。这些实现已成功应用于多个基因表达基准。

著录项

来源
《Bioinformatics Research and Applications; Lecture Notes in Bioinformatics; 4463》|2007年|85-96|共12页
会议地点 AtlantaGA(US)
作者
Roy Varshavsky; David Horn; Michal Linial;
展开▼
作者单位

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel;

School of Physics and Astronomy, Tel Aviv University, Israel;

Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类生物工程学（生物技术）;
关键词
bayesian information criterion (BIC); quantum clustering (QC); optimal K-Means (OKM); optimal quantum clustering (OQC); principal component analysis (PCA); singular value decomposition (SVD);

机译：贝叶斯信息准则（BIC）；量子聚类（QC）；最佳K均值（OKM）最佳量子聚类（OQC）；主成分分析（PCA）；奇异值分解（SVD）;

相似文献

外文文献
中文文献
专利

1. Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms [J] . Lin Kuan-Cheng, Hsieh Yi-Hsiu Journal of medical systems . 2015,第10期

机译：基于基于内分泌粒子群优化和人工蜂群算法的混合进化算法的支持向量机支持的医学数据集分类
2. A Comparative Study of CN2 Rule and SVM Algorithm and Prediction of Heart Disease Datasets Using Clustering Algorithms [J] . Ramaraj. M, Antony Selvadoss Thanamani Network and Complex Systems . 2013,第10期

机译：CN2规则和SVM算法的比较研究以及使用聚类算法预测心脏病数据集
3. Performance evaluation results of evolutionary clustering algorithm star for clustering heterogeneous datasets [J] . Bryar A. Hassan, Tarik A. Rashid, Seyedali Mirjalili Data in Brief . 2021,第a期

机译：集群异构数据集进化聚类算法星的绩效评估结果
4. An Improved Clustering Algorithm Based on k-Means and Artificial Bee Colony Optimization for Datasets that Contain Outliers [C] . Anu Balachandran, K.A. Abdul Nazeer 2018 International Conference on Computing, Power and Communication Technologies . 2018

机译：包含离群值的数据集的基于k均值和人工蜂群优化的改进聚类算法
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. Single-cell RNA-seq clustering: datasets models and algorithms [O] . Lihong Peng, Xiongfei Tian, Geng Tian, 2020

机译：单单元RNA-SEQ群集：数据集模型和算法
7. Clustering Algorithms Optimizer: A Framework for Large Datasets [O] . Roy Varshavsky, David Horn, Michal Linial 2008

机译：聚类算法优化器：大型数据集的框架
8. Evaluation of Hierarchical Clustering Algorithms for Document Datasets. [R] . Zhao, Y., Karypis, G. 2002

机译：文档数据集的层次聚类算法评估。

Clustering Algorithms Optimizer: A Framework for Large Datasets

摘要

著录项

相似文献

相关主题

期刊订阅