首页> 美国政府科技报告 >WIDELink: A Bootstrapping Approach to Identifying, Modeling and Linking On-Line Data Sources
【24h】

WIDELink: A Bootstrapping Approach to Identifying, Modeling and Linking On-Line Data Sources

机译:WIDELink:用于识别,建模和链接在线数据源的引导方法

获取原文

摘要

A link discovery system must be able to augment its knowledge base by collecting information from diverse, distributed sources. We have developed a system, WideLink, that can automatically extract data from online sources, integrate it into a domain model by automatically labeling it and automatically link it with facts already stored in a knowledge base. The challenge is to locate, extract, and integrate the data that comes from online sources. We addressed these problems by using a bootstrapping approach where the system leverages previously-gathered data, as well as the underlying structure many online data sources have, in order to identify and incorporate new data sources. WideLink systematically explores the structure of online sites so that it is able to retrieve pages on demand from complex web sites (e.g., sites with forms, embedded navigational structures, etc.). The system uses knowledge derived from previously gathered examples to help analyze new types of pages. Using examples of the type of information it is looking for, and characteristic patterns learned from those examples, WideLink can recognize relevant data from new sources, assign it to semantic categories within the domain model, and link it with previously learned facts.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号