首页> 外文会议>International conference on autonomous agents >A web-based information system that reasons with structured collections of text
【24h】

A web-based information system that reasons with structured collections of text

机译:一种基于Web的信息系统,原因是结构化文本的收藏

获取原文

摘要

The degree to which information sources are pre-processed by Web-based information systems varies greatly. In search engines like Altavista, little pre-processing is done, while in"knowledge integration"systems, complex site-specific "wrappers" are used integrate different information sources into a common database representation. In this paper we describe an intermediate between these two models. In our system, information sources are converted into a highly structured collection of small fragments of text. Database-like queries to this structured collection of text fragments are approximated using a novel logic called WHIRL, which combines inference in the style of deductive databases with ranked retrieval methods from information retrieval. WHIRL allows queries that integrate information from multiple Web sites, without requiring the extraction and normalization of object identifiers that can be used as keys; instead, operations that in conventional databases require equality tests on keys are approximated using IR similarity metrics for text. This leads to a reduction in the amount of human engineering required to field a knowledge integration system. Experimental evidence is given showing that many information sources can be easily modeled with WHIRL, and that inferences in the logic are both accurate and efficient.
机译:通过基于Web的信息系统预先处理信息源的程度大大变化。在SEARTAVISTA等搜索引擎中,完成了很少的预处理,而在“知识集成”系统中,将复杂的站点特定的“包装器”将不同的信息源集成到共同的数据库表示中。在本文中,我们描述了这两种模型之间的中间体。在我们的系统中,信息源被转换为高度结构化的小碎片文本。使用名为Whirl的新颖逻辑近似为此结构化文本碎片集合的数据库查询,该新颖逻辑将推断与Deftive数据库风格的推断相结合,具有来自信息检索的排名检索方法。旋转允许将信息从多个网站集成的查询,而无需提取和归一化可用作键的对象标识符;相反,传统数据库中的操作需要对键的平等测试近似使用文本的IR相似度量来近似。这导致现场知识集成系统所需的人工工程量减少。给出了实验证据表明许多信息源可以很容易地用旋转建模,并且逻辑中的推论既准确又高效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号