首页> 外文会议>Annual Allerton Conference on Communication, Control, and Computing >If it ain't broke, don't fix it: Sparse metric repair
【24h】

If it ain't broke, don't fix it: Sparse metric repair

机译:如果没有损坏,请不要修复:稀疏公制维修

获取原文
获取外文期刊封面目录资料

摘要

Many modern data-intensive computational problems either require, or benefit from distance or similarity data that adhere to a metric. The algorithms run faster or have better performance guarantees. Unfortunately, in real applications, the data are messy and values are noisy. The distances between the data points are far from satisfying a metric. Indeed, there are a number of different algorithms for finding the closest set of distances to the given ones that also satisfy a metric (sometimes with the extra condition of being Euclidean). These algorithms can have unintended consequences; they can change a large number of the original data points, and alter many other features of the data. The goal of sparse metric repair is to make as few changes as possible to the original data set or underlying distances so as to ensure the resulting distances satisfy the properties of a metric. In other words, we seek to minimize the sparsity (or the ℓ0“norm”) of the changes we make to the distances subject to the new distances satisfying a metric. We give three different combinatorial algorithms to repair a metric sparsely. In one setting the algorithm is guaranteed to return the sparsest solution and in the other settings, the algorithms repair the metric. Without prior information, the algorithms run in time proportional to the cube of the number of input data points and, with prior information we can reduce the running time considerably.
机译:许多现代的数据密集型计算问题都需要遵循度量标准的距离或相似性数据,或者从中受益。该算法运行速度更快或具有更好的性能保证。不幸的是,在实际应用中,数据混乱并且值嘈杂。数据点之间的距离远不能满足度量标准。确实,有许多不同的算法可以找到与给定距离最接近的一组距离,这些距离也满足一个度量标准(有时具有欧几里得的额外条件)。这些算法可能会产生意想不到的后果。他们可以更改大量原始数据点,并更改数据的许多其他功能。稀疏量度修复的目标是对原始数据集或基础距离进行尽可能少的更改,以确保所得的距离满足量度的属性。换句话说,我们试图将稀疏性(或minimize 0 我们对距离所做的更改(“范数”)以符合度量标准的新距离为准。我们给出了三种不同的组合算法来稀疏地修复度量。在一个设置中,保证算法将返回最稀疏的解决方案,而在其他设置中,算法将修复度量。在没有先验信息的情况下,算法的运行时间与输入数据点数量的立方成正比,有了先验信息,我们可以大大减少运行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号