【24h】

Rapper

机译:说唱歌手

获取原文

摘要

Database management systems are becoming available for semistructured data, however, these tools cannot be used on many real-world data sources (e.g., most web sites) in their native form. Often, wrappers are needed to extract information and organize it into a graph structure that makes explicit the concepts users want to query and update. This paper presents a new approach to wrapper generation that exploits linguistic knowledge. The approach produces a more fine-grained parse of sources with natural language text than previous efforts. The resulting graph structured databases answer queries that could not be formulated in database produced by prior generated wrappers. In addition, our approach may be more robust in the face of slight variations in word choice and order. We discuss a prototype implementation, lessons learned to date, evaluation issues, and future research directions.

机译:

数据库管理系统正变得可用于半结构化数据,但是,这些工具不能以其本机形式用于许多实际数据源(例如,大多数网站)。通常,需要包装器来提取信息并将其组织成一个图形结构,从而使用户想要查询和更新的概念更加明确。本文提出了一种利用语言知识的包装器生成新方法。与以前的工作相比,该方法可以使用自然语言文本对源进行更细粒度的解析。生成的图结构化数据库将回答无法在先前生成的包装器生成的数据库中提出的查询。另外,面对单词选择和顺序的细微变化,我们的方法可能会更健壮。我们讨论了原型实现,迄今为止的经验教训,评估问题以及未来的研究方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号