首页> 外文会议>National Conferences on Aritificial Intelligence >Learning to Extract Text-based Information from the World Wide Web
【24h】

Learning to Extract Text-based Information from the World Wide Web

机译:学习从万维网中提取基于文本的信息

获取原文

摘要

There is a wealth of information to be mined from narrative text on the World Wide Web. Unfortunately, standard natural language processing (NLP) extraction techniques expect full, grammatical sentences, and perform poorly on the choppy sentence fragments that are often found on web pages. This paper introduces Webfoot, a preprocessor that parses web pages into logically coherent segments based on page layout cues. Output from Webfoot is then passed on to CRYSTAL, an NLP system that learns text extraction rules from example. Webfoot and CRYSTAL transform the text into a formal representation that is equivalent to relational database entries. This is a necessary first step for knowledge discovery and other automated analysis of free text.
机译:有丰富的信息可以从万维网上的叙述文本中开采。遗憾的是,标准的自然语言处理(NLP)提取技术预计完整,语法句子,并且在Web页面上经常发现的波涛序句子片段中的表现不佳。本文介绍了WebFoot,一个预处理器,将网页解析为基于页面布局提示的逻辑相干段。然后将WebFoot的输出传递给Crystal,一个NLP系统从示例中了解文本提取规则。 WebFoot和Crystal将文本转换为相当于关系数据库条目的正式表示。这是知识发现的必要第一步和自由文本的其他自动分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号