首页> 外文会议>Computational science and its applications-ICCSA 2009 >A Fast Approximation Algorithm for the k Partition-Distance Problem
【24h】

A Fast Approximation Algorithm for the k Partition-Distance Problem

机译:k分割距离问题的快速近似算法

获取原文
获取原文并翻译 | 示例

摘要

Given a set of elements N, a partition consists on dividing the set of elements into two or more disjoint clusters that cover all elements. A cluster contains a non-empty subset of elements. The number of clusters of a partition is less than or equal to N. Different partitioning algorithms for the same application will produce different partitions from the same set of elements. To compute the distance and find the consensus partition (also called as consensus clustering) between two or more partitions are important and interesting problems that arise in many applications such as bioinformatics and data mining. However, different distance functions between two or more partitions will usually need to be computed by different algorithms. In this paper, we discuss the k partition-distance problem which can be applied in bioinformatics. Given a set of elements N with k partitions, the k partition-distance problem is to delete the minimum number of elements from each partition such that all remaining partitions become identical. However, this problem has been shown to be NP-complete when k > 2. We will present the first known approximation algorithm with performance ratio 2 to solve this problem in O(k * p * |N|) time, where p is the maximum number of clusters of these k partitions. Then we perform our algorithm for simulation of random data and actual the set of organisms based on DNA markers. It will show that our approximation solution is at most twice the partition-distance of the optimal solution in practice.
机译:给定一组元素N,分区包括将一组元素划分为两个或多个覆盖所有元素的不相交簇。集群包含元素的非空子集。一个分区的簇数小于或等于N。针对同一应用程序的不同分区算法将根据同一组元素生成不同的分区。计算距离并找到两个或多个分区之间的共识分区(也称为共识聚类)是重要且有趣的问题,这些问题在许多应用程序中出现,例如生物信息学和数据挖掘。但是,通常需要通过不同的算法来计算两个或多个分区之间的距离函数。在本文中,我们讨论了可应用于生物信息学的k分区距离问题。给定一组具有k个分区的元素N,第k个分区距离问题是从每个分区中删除最少数量的元素,以使所有其余分区都相同。然而,当k> 2时,这个问题已经证明是NP完全的。我们将提出第一个已知的性能比为2的近似算法,以在O(k * p * | N |)的时间内解决此问题,其中p是这k个分区的最大群集数。然后,我们执行我们的算法,以模拟随机数据并基于DNA标记实际设置一组生物。这将表明我们的近似解在实践中最多是最佳解的分区距离的两倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号