首页> 外文学位 >Algorithms for NP-hard Optimization Problems and Cluster Analysis
【24h】

Algorithms for NP-hard Optimization Problems and Cluster Analysis

机译:NP难优化问题的算法和聚类分析

获取原文
获取原文并翻译 | 示例

摘要

The set cover problem, weighted set cover problem, minimum dominating set problem and minimum weighted dominating set problem are all classical NP-hard optimization problems of great importance in both theory and real applications. Since the exact algorithms, which require exhaustive exploration of exponentially many options, are infeasible in practice, approximation algorithms and heuristic algorithms are widely used to find reasonably good solutions in polynomial time. I propose novel algorithms for these problems. My algorithms for the weighted set cover and minimum weighted dominating set problems are based on a three-step strategy. For the weighted set cover problem, in the first step, we reserve the sets indispensable for the optimal solution and reduce the problem size. In the second step, we build a robust solution with a novel greedy heuristic. Sets are iteratively selected according to a measure which integrates the weight, the coverage gain for the current iteration and the global coverage capacity of each set. It favors the sets that have smaller weights and better extend or consolidate the coverage, especially on the items that are contained in less sets. Since the obtained solution tends to have a robust coverage, in the third step, we further improve it by removing the redundant sets in an efficient way. For the minimum weighted dominating set problem, we first reserve the indispensable vertices for the optimal solution. Then we convert it into a weighted set cover problem to solve it. These two algorithms can be used to solve the set cover problem and minimum dominating set problem by simply considering all the sets or vertices as having the same weights. Extensive experimental evaluations on a large number of synthetic and real-world set cover instances and graphs from many domains demonstrate the superiority of my algorithms over state-of-the-art.;Cluster analysis is a fundamental problem in data analysis, and has extensive applications in artificial intelligence, statistics and even in social sciences. The goal is to partition the data objects into a set of groups (clusters) such that objects in the same group are similar, while objects in different groups are dissimilar.;Most of the existing algorithms for clustering are designed to handle data with only one type of attributes, e.g. continuous, categorical or ordinal. Mixed data clustering has received relatively less attention, despite the fact that data with mixed types of attributes are common in real applications. I propose a novel affinity learning based framework for mixed data clustering, which includes: how to process data with mixed-type attributes, how to learn affinities between data points, and how to exploit the learned affinities for clustering. In the proposed framework, each original data attribute is represented with several abstract objects defined according to the specific data type and values. Each attribute value is transformed into the initial affinities between the data point and the abstract objects of attribute. I refine these affinities and infer the unknown affinities between data points by taking into account the interconnections among the attribute values of all data points. The inferred affinities between data points can be exploited for clustering. Alternatively, the refined affinities between data points and the abstract objects of attributes can be transformed into new data features for clustering. Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering. This work was published in our IJCAI 2017 paper Li & Latecki (2017).;Clustering aggregation, also known as consensus clustering or clustering ensemble, aims to find a single superior clustering from a number of input clusterings obtained by different algorithms with different parameters. I formulate clustering aggregation as a special instance of the maximum-weight independent set (MWIS) problem. For a given data set, an attributed graph is constructed from the union of the input clusterings. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the data set together. I formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. (Abstract shortened by ProQuest.).
机译:集合覆盖问题,加权集合覆盖问题,最小控制集问题和最小加权控制集问题都是经典的NP硬优化问题,在理论和实际应用中均具有重要意义。由于在实际中不可行,需要详尽地探索指数选项,因此近似算法和启发式算法被广泛用于在多项式时间内找到合理的好解。我针对这些问题提出了新颖的算法。我针对加权集覆盖率和最小加权支配集问题的算法基于三步策略。对于加权集覆盖问题,在第一步中,我们保留对于最佳解决方案必不可少的集,并减小问题的大小。在第二步中,我们使用新颖的贪婪启发式方法构建健壮的解决方案。根据将权重,当前迭代的覆盖范围增益和每个集合的全局覆盖能力进行积分的度量来迭代选择集合。它偏爱具有较小权重并更好地扩展或合并覆盖范围的集合,尤其是包含在较少集合中的项目上。由于获得的解决方案倾向于具有可靠的覆盖范围,因此在第三步中,我们通过以有效的方式删除冗余集来进一步改进它。对于最小加权支配集问题,我们首先保留必不可少的顶点,以获取最优解。然后我们将其转换为加权集覆盖问题以解决该问题。通过简单地将所有集合或顶点视为具有相同权重,可以将这两种算法用于解决集合覆盖问题和最小支配集合问题。对来自许多领域的大量合成和现实世界的套图实例和图形进行的广泛实验评估证明了我的算法优于最新技术的优势。聚类分析是数据分析中的一个基本问题,并且具有广泛的应用价值在人工智能,统计甚至社会科学中的应用。目标是将数据对象划分为一组组(集群),以使同一组中的对象相似,而不同组中的对象彼此不相似。大多数现有的聚类算法均设计为仅处理一个数据属性类型,例如连续的,分类的或有序的。尽管具有混合类型的属性的数据在实际应用程序中很常见,但混合数据集群的关注程度相对较低。我提出了一个用于混合数据聚类的基于亲和力学习的新颖框架,该框架包括:如何处理具有混合类型属性的数据,如何学习数据点之间的亲和力以及如何利用所学习的亲和力进行聚类。在提出的框架中,每个原始数据属性都由几个根据特定数据类型和值定义的抽象对象表示。每个属性值都转换为数据点和属性抽象对象之间的初始亲和力。通过考虑所有数据点的属性值之间的相互联系,我细化了这些亲和力并推断出数据点之间的未知亲和力。可以利用数据点之间的推断亲和力进行聚类。或者,可以将数据点与属性的抽象对象之间的精确关联性转换为用于聚类的新数据特征。在许多现实世界数据集上的实验结果表明,该框架对于混合数据聚类是有效的。这项工作发表在我们的IJCAI 2017论文Li&Latecki(2017)中;聚类聚合,也称为共识聚类或聚类集成,旨在从通过不同算法,使用不同参数获得的多个输入聚类中找到单个高级聚类。我将集群聚合公式化为最大权重独立集(MWIS)问题的特殊实例。对于给定的数据集,根据输入聚类的并集构造属性图。代表不同簇的顶点通过内部指数加权,该指数既测量凝聚力又测量分离度。边连接其相应簇重叠的顶点。直观地,可以通过选择将数据集划分在一起的非重叠群集的最佳子集来获得最佳聚合群集。我将这种直觉形式化为属性图上的MWIS问题,即找到相互不相邻的顶点的最重子集。这个MWIS问题表现出特殊的结构。 (摘要由ProQuest缩短。)。

著录项

  • 作者

    Li, Nan.;

  • 作者单位

    Temple University.;

  • 授予单位 Temple University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 101 p.
  • 总页数 101
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号