首页> 外文期刊>Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR >An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence
【24h】

An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence

机译:一个精确地搜索最大群集在一个序列的整数值问题中的最大群集

获取原文
获取原文并翻译 | 示例
       

摘要

We analyze mathematical aspects of one of the fundamental data analysis problems consisting in the search (selection) for the subset with the largest number of similar elements among a collection of objects. In particular, the problem appears in connection with the analysis of data in the form of time series (discrete signals). One of the problems in modeling this challenge is considered, namely, the problem of finding the cluster of the largest size (cardinality) in a 2-partition of a finite sequence of points in Euclidean space into two clusters (subsequences) under two constraints. The first constraint is on the choice of the indices of elements included in the clusters. This constraint simulates the set of time-admissible configurations of similar elements in the observed discrete signal. The second constraint is imposed on the value of the quadratic clustering function. This constraint simulates the level of intracluster proximity of objects. The clustering function under the second constraint is the sum (over both clusters) of the intracluster sums of squared distances between the cluster elements and its center. The center of one of the clusters is unknown and defined as the centroid (the arithmetic mean over all elements of this cluster). The center of the other cluster is the origin. Under the first constraint, the difference between any two subsequent indices of elements contained in a cluster with an unknown center is bounded above and below by some constants. It is established in the paper that the optimization problem under consideration, which models one of the simplest significant problems of data analysis, is strongly NP-hard. We propose an exact algorithm for the case of a problem with integer coordinates of its input points. If the dimension of the space is bounded by a constant, then the algorithm is pseudopolynomial.
机译:我们分析了一个基本数据分析问题之一的数学方面,该问题包括具有最大数量的对象之间的相似元素数量的子集中的搜索(选择)。特别是,问题出现在时间序列(离散信号)形式的数据分析。考虑建模这种挑战的问题之一,即,在两个约束下,在欧几里德空间中的有限点的2分区中找到最大尺寸(基数)的群集的问题。第一个约束是选择集群中包含的元素的指标。该约束模拟了观察到的离散信号中的类似元件的一组时间允许配置。对二次聚类功能的值施加第二约束。该约束模拟对象的内部内部的级别。第二个约束下的聚类功能是集群元素与其中心之间的平方距离的跨界距离的总和(群集)。其中一个集群的中心未知并定义为质心(算术平均值在此集群的所有元素上)。另一个集群的中心是原点。在第一个约束下,与未知中心的群集中包含的元素的任何两个后续指标之间的差异在上方和下方由某些常数界定。在本文中建立的是,正在考虑的优化问题,其中模型数据分析最简单的重要问题之一,是强烈的NP-COLLE。我们为其输入点的整数坐标提出了一个精确的算法。如果空间的尺寸由常数界定,则算法是假验二极管的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号