首页> 外文会议>International conference on very large data bases >Searchlight: Enabling Integrated Search and Exploration over Large Multidimensional Data
【24h】

Searchlight: Enabling Integrated Search and Exploration over Large Multidimensional Data

机译:探照灯:实现对大型多维数据的集成搜索和探索

获取原文

摘要

We present a new system, called Searchlight, that uniquely integrates constraint solving and data management techniques. It allows Constraint Programming (CP) machinery to run efficiently inside a DBMS without the need to extract, transform and move the data. This marriage concurrently offers the rich expressiveness and efficiency of constraint-based search and optimization provided by modern CP solvers, and the ability of DBMSs to store and query data at scale, resulting in an enriched functionality that can effectively support both data- and search-intensive applications. As such, Searchlight is the first system to support generic search, exploration and mining over large multi-dimensional data collections, going beyond point algorithms designed for point search and mining tasks. Searchlight makes the following scientific contributions: 1. Constraint solvers as first-class citizens Instead of treating solver logic as a black-box, Searchlight provides native support, incorporating the necessary APIs for its specification and transparent execution as part of query plans, as well as novel algorithms for its optimized execution and parallelization. 2. Speculative solving Existing solvers assume that the entire data set is main-memory resident. Searchlight uses an innovative two stage Solve- Validate approach that allows it to operate speculatively yet safely on main-memory synopses, quickly producing candidate search results that can later be efficiently validated on real data. 3. Computation and I/O load balancing As CP solver logic can be computationally expensive, executing it on large search and data spaces requires novel CPU-I/O balancing approaches when performing search distribution. We built a prototype implementation of Searchlight on Google's Or-Tools, an open-source suite of operations research tools, and the array DBMS SciDB. Extensive experimental results show that Searchlight often performs orders of magnitude faster than the next best approach (SciDB-only or CP-solver-only) in terms of end response time and time to first result.
机译:我们提出了一个名为Searchlight的新系统,该系统独特地集成了约束解决方案和数据管理技术。它使约束编程(CP)机械可以在DBMS内部高效运行,而无需提取,转换和移动数据。这种结合同时提供了现代CP解算器提供的基于约束的搜索和优化的丰富表现力和效率,以及DBMS大规模存储和查询数据的能力,从而产生了可以有效支持数据和搜索的丰富功能。密集的应用程序。因此,Searchlight是第一个支持对大型多维数据集进行通用搜索,探索和挖掘的系统,这超出了为点搜索和挖掘任务而设计的点算法。 Searchlight做出了以下科学贡献:1.将求解器约束为一等公民,而不是将求解器逻辑视为黑盒,Searchlight提供了本机支持,将用于其规范和透明执行的必要API作为查询计划的一部分,以及作为其优化执行和并行化的新颖算法。 2.投机求解现有的求解器假定整个数据集是驻留在主内存中的。 Searchlight使用创新的两阶段“求解验证”方法,使它可以对主内存大纲进行推测性但安全的操作,快速生成候选搜索结果,随后可以在真实数据上对其进行有效验证。 3.计算和I / O负载平衡由于CP求解器逻辑在计算上可能很昂贵,因此在执行搜索分配时,在大型搜索和数据空间上执行它需要新颖的CPU-I / O平衡方法。我们在Google的Or-Tools,开源的运筹学工具套件和数组DBMS SciDB上构建了Searchlight的原型实现。大量的实验结果表明,就最终响应时间和获得第一个结果的时间而言,Searchlight的执行速度通常比次佳方法(仅适用于SciDB或仅CP解决方案)要快几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号