首页> 外文会议>SIGMOD/PODS >Supporting Ranking and Clustering as Generalized Order-By and Group-By*
【24h】

Supporting Ranking and Clustering as Generalized Order-By and Group-By*

机译:支持排名和群集作为广义order-by和by *

获取原文

摘要

The Boolean semantics of SQL queries cannot adequately capture the "fuzzy" preferences and "soft" criteria required in non-traditional data retrieval applications. One way to solve this problem is to add a flavor of "information retrieval" into database queries by allowing fuzzy query conditions and exibly supporting grouping and ranking of the query results within the DBMS engine. While ranking is already supported by all major commercial DBMSs natively, support of flexibly grouping is still very limited (I.e., group-by). In this paper, we propose to generalize group-by to enable exible grouping (clustering specifically) of the query results. Different from clustering in data mining applications, our focus is on supporting efficient clustering of Boolean results generated at query time. Moreover, we propose to integrate ranking and clustering with Boolean conditions, forming a new type of ClusterRank query to allow structured data retrieval. Such an integration is nontrivial in terms of both semantics and query processing. We investigate various semantics of this type of queries. To process such queries, a straightforward approach is to simply glue the techniques developed for ranking-only and clustering-only together. This approach is costly since both ranking and clustering are treated as blocking post-processing tasks upon Boolean query results by existing techniques. We propose a summary-based evaluation method that utilizes bitmap index to seamlessly integrate Boolean conditions, clustering, and ranking. Experimental study shows that our approach significantly outperforms the straightforward one and maintains high clustering quality.
机译:SQL查询的布尔语义无法充分捕获非传统数据检索应用中所需的“模糊”首选项和“软”标准。解决此问题的一种方法是通过允许模糊查询条件和令人难以置信地支持DBMS引擎内的查询结果来将“信息检索”的风格添加到数据库查询中。虽然所有主要商业DBMS本地支持排名,但对灵活分组的支持仍然非常有限(即,逐组)。在本文中,我们建议概括组 - 以便启用查询结果的可用分组(具体群集)。与数据挖掘应用程序中的聚类不同,我们的重点是支持在查询时间生成的布尔结果的有效聚类。此外,我们建议将排名和聚类与布尔条件集成,形成新型的ClusterRank查询,以允许结构化数据检索。在语义和查询处理方面,这种集成是不可行的。我们调查这种类型的各种语义。为了处理此类查询,简单的方法是简单地粘合为仅限排名和聚类的技术而开发的技术。这种方法成本高,因为排名和群集都被视为通过现有技术的布尔查询结果阻止后处理任务。我们提出了一种基于摘要的评估方法,该方法利用位图索引来无缝集成布尔条件,聚类和排名。实验研究表明,我们的方法显着优于直接的,并保持高集群质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号