首页> 外文会议>Bioinformatics Research and Applications; Lecture Notes in Bioinformatics; 4463 >Clustering Algorithms Optimizer: A Framework for Large Datasets
【24h】

Clustering Algorithms Optimizer: A Framework for Large Datasets

机译:聚类算法优化器:大型数据集的框架

获取原文
获取原文并翻译 | 示例

摘要

Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (ⅰ) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a data-driven framework that includes two interrelated steps. The first one is SVD-based dimension reduction and the second is an automated tuning of the algorithm's parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.
机译:在许多生物信息学任务中都采用了聚类算法,包括蛋白质序列的分类和基因表达数据的分析。尽管通常应用这些算法,但是它们中的许多受到以下限制:(ⅰ)依赖于预定的参数调整,例如关于聚类数量的先验知识; (ii)涉及产生不确定结果的不确定性程序。因此,需要一种解决这些缺点的框架。我们提供了一个数据驱动的框架,其中包括两个相互关联的步骤。第一个是基于SVD的降维,第二个是对算法参数的自动调整。对于大型数据集,有效地调整了降维步骤。根据称为贝叶斯信息准则(BIC)的内部评估标准来确定最佳参数设置。该框架可以合并大多数聚类算法并提高其性能。在这项研究中,我们通过结合标准的K-Means和量子聚类算法来说明该平台的有效性。这些实现已成功应用于多个基因表达基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号