Large-scale clustering: Algorithms and applications.

机译：大规模聚类：算法和应用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Clustering is a central problem in unsupervised learning for discovering interesting patterns in the underlying data. Though there have been numerous studies on clustering methods, the focus of this dissertation is on developing efficient clustering algorithms for large-scale applications such as text mining, network analysis, image segmentation and bioinformatics.; We first present a time and memory efficient technique for the entire process of text clustering, including the creation of the vector space model for documents. This efficiency is obtained by (i) a memory-efficient multi-threaded preprocessing scheme, and (ii) a fast clustering algorithm that fully exploits the sparsity of the data set. We show that this entire process takes time that is linear in the size of the document collection.; Clustering algorithms which are based on heuristics can get trapped in inferior local optima therefore yielding qualitatively poor results. As the second part of our work, we propose the use of local search and annealing to improve the quality of the clustering results. In local search, we create a chain of incremental point moves that leads the objective function out of local optima; while the idea of annealing is that we enforce the perturbation of cluster centers after clusters become stablized. The effectiveness of these techniques is illustrated in text clustering and gene expression analysis.; Data in many domains, such as cluster analysis of the world wide web or circuit partitioning, is represented as graphs. Clustering is often used to find and analyze structural and functional properties of these graphs. In the last part of the dissertation, we present an efficient, high-quality multilevel kernel-based graph clustering algorithm, which outperforms previous state-of-the-art spectral methods in quality and runs hundreds or even thousands of times faster. Our multilevel graph clustering algorithm is based on a theoretical connection with the weighted kernel k-means clustering algorithm. We empirically demonstrate that our algorithm is efficient and effective on large social networks, protein interaction networks and image segmentation.

机译：聚类是在无监督学习中发现基础数据中有趣模式的核心问题。尽管关于聚类方法的研究很多，但本文的重点是为文本挖掘，网络分析，图像分割和生物信息学等大规模应用开发有效的聚类算法。我们首先为整个文本聚类过程（包括为文档创建向量空间模型）提出一种节省时间和内存的技术。通过（i）内存有效的多线程预处理方案和（ii）充分利用数据集稀疏性的快速聚类算法，可以获得这种效率。我们证明了整个过程花费的时间与文档集合的大小成线性关系。基于启发式算法的聚类算法可能会陷入较差的局部最优状态，从而导致定性较差的结果。作为我们工作的第二部分，我们建议使用局部搜索和退火来提高聚类结果的质量。在局部搜索中，我们创建了一系列增量点移动，将目标函数引出局部最优点。而退火的思想是在团簇稳定后，我们对团簇中心进行微扰。这些技术的有效性在文本聚类和基因表达分析中得到了说明。许多领域的数据（例如，万维网的群集分析或电路分区）均以图形表示。聚类通常用于查找和分析这些图的结构和功能特性。在本文的最后一部分，我们提出了一种高效，高质量的基于内核的多级图聚类算法，该算法在质量上优于以前的最新光谱方法，并且运行速度快了数百甚至数千倍。我们的多级图聚类算法基于与加权内核k均值聚类算法的理论联系。我们从经验上证明我们的算法在大型社交网络，蛋白质相互作用网络和图像分割方面是有效的。

著录项

作者
Guan, Yuqiang.;
展开▼
作者单位

The University of Texas at Austin.;

展开▼
授予单位 The University of Texas at Austin.;
学科 Computer Science.
学位 Ph.D.
年度 2006
页码 174 p.
总页数 174
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Two-level distributed clustering routing algorithm based on unequal clusters for large-scale Internet of Things networks [J] . Journal of supercomputing . 2020,第3期

机译：大规模物联网网络中基于不等聚类的两级分布式聚类路由算法
2. Optimized routers positions for large-scale RF mesh networks based on clustering algorithms [J] . Mezher Ahmad Mohamad, Cardenas-Barrera Julian, Rajendran Nisha, Ad hoc networks . 2019,第OCTa期

机译：基于聚类算法的大型RF网状网络的最佳路由器位置
3. Optimized routers positions for large-scale RF mesh networks based on clustering algorithms [J] . Mezher Ahmad Mohamad, Cardenas-Barrera Julian, Rajendran Nisha, Ad hoc networks . 2019,第Octa期

机译：优化基于聚类算法的大型RF网格网络的路由器位置
4. Large-scale clustering using decomposition-based evolutionary algorithms [C] . Aleksei Vakhnin, Evgenii Sopov IEEE Symposium Series on Computational Intelligence . 2020

机译：使用基于分解的进化算法的大规模聚类
5. Efficient Sequence Clustering and Embedding Algorithms for Large-scale Metagenomics Data [D] . Zheng, Wei. 2019

机译：大规模偏心组织数据的高效序列聚类和嵌入算法
6. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters [O] . Haidong Lan, Yuandong Chan, Kai Xu, 2016

机译：基于Xeon-Phi簇的大规模生物序列比对的并行算法
7. Genetic based clustering algorithms and applications. [O] . 2000

机译：Genetic based clustering algorithms and applications.

Large-scale clustering: Algorithms and applications.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅