...
首页> 外文期刊>Information Sciences: An International Journal >A low redundancy strategy for keyword search in structured and semi-structured data
【24h】

A low redundancy strategy for keyword search in structured and semi-structured data

机译:在结构化和半结构化数据中的关键字搜索的低冗余策略

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Keyword Search has been recognised as a viable alternative for information search in semistructured and structured data sources. Current state-of-the-art keyword-search techniques over relational databases do not take advantage of correlative meta-information included in structured and semi-structured data sources leaving relevant answers out. These techniques are also limited due to scalability, performance and precision issues that are evident when they are implemented on large datasets. Based on an in-depth analysis of issues related to indexing and ranking semi-structured and structured information. We propose a new keyword-search algorithm that takes into account the semantic information extracted from the schemes of the structured and semi-structured data sources and combine it with the textual relevance obtained by a common text retrieval approach. The algorithm is implemented in a keyword-based search engine called KESOSASD (Keyword Search Over Semi-structured and Structured Data), improving its precision and response time. Our approach models the semi-structured and structured information as graphs, and make use of a Virtual Document Structure Aware Inverted Index (VDSAII). This index is created from a set of logical structures called Virtual Documents, which capture and exploit the implicit structural relationships (semantics) depicted in the schemas of the structured and semi-structured data sources. Extensive experiments were conducted to demonstrate that KESOSASD outperforms existing approaches in terms of search efficiency and accuracy. Moreover, KESOSASD is prepared to scale out and manage large databases without degrading its effectiveness.
机译:关键字搜索已被识别为新建和结构化数据源中信息搜索的可行替代方案。目前最先进的关键字 - 搜索技术与关系数据库不利用包括在结构化和半结构化数据源中的相关元信息,留出相关答案。由于在大型数据集中实现时,这些技术也受到了显而易见的可伸缩性,性能和精度问题。基于对索引和排名和结构化信息相关的问题的深入分析。我们提出了一种新的关键字搜索算法,其考虑了从结构化和半结构化数据源的方案中提取的语义信息,并将其与通过常规文本检索方法获得的文本相关性组合。该算法在称为KesosasD的基于关键字的搜索引擎中实现(通过半结构化和结构化数据),提高其精度和响应时间。我们的方法将半结构化和结构化信息作为图形模拟,并利用虚拟文档结构感知反转索引(VDSAii)。该索引是从一个名为虚拟文档的一组逻辑结构创建的索引,该虚拟文档捕获和利用结构化和半结构化数据源的模式中描绘的隐式结构关系(语义)。进行了广泛的实验,以证明Kesosasd在搜索效率和准确性方面优于现有的现有方法。此外,kesosasd准备扩展和管理大型数据库,而不会降低其有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号