...
首页> 外文期刊>Artificial intelligence >Learning to construct knowledge bases from the World Wide Web
【24h】

Learning to construct knowledge bases from the World Wide Web

机译:从万维网学习构建知识库

获取原文
获取原文并翻译 | 示例
           

摘要

The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system that has created a knowledge base describing university people, courses, and research projects.
机译:万维网是计算机可以访问的大量信息源,但是只有人类可以理解。此处描述的研究目标是自动创建计算机可理解的知识库,其内容与万维网的内容相同。这样的知识库将能够更有效地检索Web信息,并促进Web的新用途,以支持基于知识的推理和问题解决。我们的方法是开发一个需要两个输入的可训练信息提取系统。第一个是本体,其定义了在创建知识库时感兴趣的类别(例如,公司,人员,雇员,产品)和关系(例如,ed by_by,produced_by)。第二个是一组训练数据,由代表这些类和关系实例的超文本标记区域组成。给定这些输入,系统将学习从Web上的其他页面和超链接中提取信息。本文介绍了我们的一般方法,用于此任务的几种机器学习算法,以及通过原型系统创建的有希望的初步结果,该原型系统创建了描述大学人士,课程和研究项目的知识库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号