首页> 外文期刊>Computational Intelligence >AN EFFICIENT REPRESENTATION MODEL OF DISTANCE DISTRIBUTION BETWEEN UNCERTAIN OBJECTS
【24h】

AN EFFICIENT REPRESENTATION MODEL OF DISTANCE DISTRIBUTION BETWEEN UNCERTAIN OBJECTS

机译:不确定对象之间距离分布的有效表示模型

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we consider the problem of efficient computation of distance between uncertain objects. In many real life applications, data like sensor readings and weather forecasts are usually uncertain when they are collected or produced. An uncertain object has a probability distribution function (PDF) to represent the probability that it is actually located in a particular location. A fast and accurate distance computation between uncertain objects is important to many uncertain query evaluation (e.g., range queries and nearest-neighbor queries) and uncertain data mining tasks (e.g., classifications, clustering, and outlier detection). However, existing approaches involve distance computations between samples of two objects, which is very computationally intensive. On one hand, it is expensive to calculate and store the actual distribution of the possible distance values between two uncertain objects. On the other hand, the expected distance (the weighted average of the pairwise distances among samples of two uncertain objects) provides very limited information and also restricts the definitions and usefulness of queries and mining tasks. In this paper, we propose several approaches to calculate the mean of the actual distance distribution and approximate its variance. Based on these, we suggest that the actual distance distribution could be approximated using a standard distribution like Gaussian or Gamma distribution. Experiments on real data and synthetic data show that our approach produces an approximation in a very short time with acceptable accuracy (about 90%). We suggest that it is practical for the research communities to define and develop more powerful queries and data mining tasks based on the distance distribution instead of the expected distance.
机译:在本文中,我们考虑了有效计算不确定对象之间距离的问题。在许多实际应用中,传感器读数和天气预报等数据通常在收集或生成时都是不确定的。不确定对象具有概率分布函数(PDF)来表示其实际位于特定位置的概率。不确定对象之间的快速准确的距离计算对于许多不确定查询评估(例如,范围查询和最近邻居查询)和不确定数据挖掘任务(例如,分类,聚类和离群值检测)都很重要。但是,现有方法涉及两个对象的样本之间的距离计算,这在计算上非常费力。一方面,计算和存储两个不确定对象之间的可能距离值的实际分布非常昂贵。另一方面,期望距离(两个不确定对象的样本之间的成对距离的加权平均值)提供了非常有限的信息,并且还限制了查询和挖掘任务的定义和有用性。在本文中,我们提出了几种方法来计算实际距离分布的均值并近似其方差。基于这些,我们建议可以使用诸如高斯或伽马分布的标准分布来近似实际距离分布。对真实数据和合成数据的实验表明,我们的方法可以在很短的时间内以可接受的精度(大约90%)产生近似值。我们建议研究社区基于距离分布而不是预期距离来定义和开发更强大的查询和数据挖掘任务是可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号