首页> 外文会议>International Conference on Similarity Search and Applications >Taking Advantage of Highly-Correlated Attributes in Similarity Queries with Missing Values
【24h】

Taking Advantage of Highly-Correlated Attributes in Similarity Queries with Missing Values

机译:利用具有缺失值的相似性查询中的高相关属性

获取原文

摘要

Incompleteness harms the quality of content-based retrieval and analysis in similarity queries. Missing data are usually evaluated using exclusion and imputation methods to infer possible values to complete gaps. However, such approaches can introduce bias into data and lose useful information. Similarity queries cannot perform over incomplete complex tuples, since distance functions are undefined over missing values. We propose the SOLID approach to allow similarity queries in complex databases without the need neither of data imputation nor deletion. First, SOLID finds highly-correlated metric spaces. Then, SOLID uses a weighted distance function to search by similarity over tuples of complex objects using compatibility factors among metric spaces. Experimental results show that SOLID outperforms imputation methods with different missing rates. SOLID was up to 7.3% better than the competitors in quality when querying over incomplete tuples, reducing 16.42% the error of similarity searches over incomplete data, and being up to 30.8 times faster than the closest competitor.
机译:不完整性危害了基于内容的检索和分析的质量。通常使用排除和撤销方法评估缺少数据,以推断出可能的值以完成空白。但是,这种方法可以将偏差引入数据并失去有用的信息。相似性查询不能超过不完整的复杂元组,因为距离函数未定义在缺失值上。我们提出了允许在复杂数据库中允许相似性查询的实用方法,而无需数据归档也不需要删除。首先,固体找到高度相关的公制空间。然后,Solid使用加权距离功能来使用度量空间之间的兼容性因子在复杂对象的元组中搜索。实验结果表明,固体优于不同缺失率的估算方法。当查询不完整的元组时,固体比竞争对手更好地高达7.3%,减少了16.42%的相似性误差在不完整的数据上搜索,并且比最接近的竞争对手快30.8倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号