首页> 外文期刊>Data & Knowledge Engineering >SUSAX: Context-specific searching in XML documents using sequence alignment techniques
【24h】

SUSAX: Context-specific searching in XML documents using sequence alignment techniques

机译:SUSAX:使用序列比对技术在XML文档中进行上下文特定的搜索

获取原文
获取原文并翻译 | 示例
       

摘要

Keyword searching while very successful in narrowing down the contents of the Web to the pertaining subset of information, has two primary drawbacks. First, the accuracy of the search is closely coupled with the choice of keywords. Second, keywords are limited in their expressibility. In particular, they fail to adequately capture the contextual information implicit in most searches done by users. In this paper we present an approach to efficiently address these drawbacks of keyword searching over XML documents. In particular, we present SUSAX a system for approximate contextual querying over XML documents wherein queries are represented as simple XPaths. A key contribution of our work is the novel algorithm used to match the XPath-like query with similar paths in the repository. The algorithm is based on sequence alignment algorithms prevalent in life sciences domain for discovering the similarity between genome and protein sequences. In this paper, we show an adaptation of the sequence alignment algorithm for now discovering and cataloging the similarity between two paths.
机译:关键字搜索在将Web的内容缩小到相关信息子集方面非常成功的同时,有两个主要缺点。首先,搜索的准确性与关键字的选择紧密相关。其次,关键字的可表达性受到限制。特别是,它们无法充分捕获用户执行的大多数搜索中隐含的上下文信息。在本文中,我们提出了一种有效解决XML文档上关键字搜索的缺点的方法。特别是,我们为SUSAX提供了一个用于对XML文档进行近似上下文查询的系统,其中查询表示为简单的XPath。我们工作的关键贡献是新颖的算法,该算法用于将类似XPath的查询与存储库中的相似路径进行匹配。该算法基于生命科学领域中普遍使用的序列比对算法,用于发现基因组序列和蛋白质序列之间的相似性。在本文中,我们展示了序列比对算法的一种改编,现在可以发现和分类两条路径之间的相似性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号