Weighted Set Similarity: Queries and Updates

机译：加权集相似性：查询和更新

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Consider a universe of items, each of which is associated with a weight, and a database consisting of subsets of these items. Given a query set, a weighted set similarity query identifies either (i) all sets in the database whose normalized similarity to the query set is above a pre-specified threshold, or (ii) the sets in the database with the k highest similarity values to the query set. Weighted set similarity queries are useful in applications like data cleaning and integration for finding approximate matches in the presence of typographical mistakes, multiple formatting conventions, transformation errors, etc. We show that this problem has semantic properties that can be exploited to design index structures that support efficient algorithms for answering queries; these algorithms can achieve arbitrarily stronger pruning than the family of Threshold Algorithms. We describe how these index structures can beefficiently updated using lazy propagation in a way that gives strict guarantees on the quality of subsequent query answers. Finally, we illustrate that our proposed ideas work well in practice for real datasets.

机译：考虑一整套项目，每个项目都与一个权重相关联，并考虑一个由这些项目的子集组成的数据库。在给定查询集的情况下，加权集相似度查询要么标识（i）数据库中与查询集的归一化相似度高于预定阈值的所有集，要么（ii）数据库中具有k个最高相似度值的集到查询集。加权集相似性查询在诸如数据清理和集成之类的应用中非常有用，可在存在印刷错误，多种格式约定，转换错误等情况下查找近似匹配项。我们证明了该问题具有可用于设计索引结构的语义属性。支持用于回答查询的高效算法;与“阈值算法”系列相比，这些算法可以实现更强的修剪效果。我们描述了如何使用延迟传播有效地更新这些索引结构，从而为后续查询答案的质量提供了严格的保证。最后，我们说明了我们提出的想法在实际数据集中的实践中效果很好。

著录项

来源
《Data Engineering, ICDE, 2009 IEEE 25th International Conference on》|2009年|P.1559|共1页
会议地点
作者
Srivastava; Divesh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. Effective image retrieval based on hybrid features with weighted similarity measure and query image classification [J] . Vibhav Prakash Singh, Rajeev Srivastava International journal of computational vision and robotics . 2018,第2期

机译：基于具有加权相似度度量和查询图像分类的混合特征的有效图像检索
2. A quadratic lower bound for Rocchio's similarity-based relevance feedback algorithm with a fixed query updating factor [J] . Chen ZX, Fu B, Abraham J Journal of combinatorial optimization . 2010,第2期

机译：具有固定查询更新因子的Rocchio基于相似度的相关性反馈算法的二次下界
3. SymDex: Increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing [J] . Tai D., Fang J. Journal of chemical information and modeling . 2012,第8期

机译：SymDex：通过使用查询集索引来提高化学指纹相似性搜索的效率，以比较大型化学库
4. Weighted Set Similarity: Queries and Updates [C] . Srivastava Divesh IEEE International Conference on Data Engineering . 2009

机译：加权设定相似度：查询和更新
5. Efficient implementation of update and retrieval query sequences over large data sets in a native XML database [D] . Mikhaylov, Alexander 2006

机译：在本机XML数据库中对大型数据集的更新和检索查询序列的有效实现
6. Exploring Inter-Instance Relationships within the Query Set for Robust Image Set Matching [O] . Deyin Liu, Chengwu Liang, Zhiming Zhang, 2019

机译：探索查询集中的实例间关系以实现可靠的图像集匹配
7. Adaptive majority problems for restricted query graphs and for weighted sets [O] . Gábor Damásdi, Dániel Gerbner, Gyula O.H. Katona, 2021

机译：受限制查询图和加权集的自适应大多数问题

Weighted Set Similarity: Queries and Updates

摘要

著录项

相似文献

相关主题

期刊订阅