Learning to Extract Text-based Information from the World Wide Web

机译：学习从万维网中提取基于文本的信息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is a wealth of information to be mined from narrative text on the World Wide Web. Unfortunately, standard natural language processing (NLP) extraction techniques expect full, grammatical sentences, and perform poorly on the choppy sentence fragments that are often found on web pages. This paper introduces Webfoot, a preprocessor that parses web pages into logically coherent segments based on page layout cues. Output from Webfoot is then passed on to CRYSTAL, an NLP system that learns text extraction rules from example. Webfoot and CRYSTAL transform the text into a formal representation that is equivalent to relational database entries. This is a necessary first step for knowledge discovery and other automated analysis of free text.

机译：有丰富的信息可以从万维网上的叙述文本中开采。遗憾的是，标准的自然语言处理（NLP）提取技术预计完整，语法句子，并且在Web页面上经常发现的波涛序句子片段中的表现不佳。本文介绍了WebFoot，一个预处理器，将网页解析为基于页面布局提示的逻辑相干段。然后将WebFoot的输出传递给Crystal，一个NLP系统从示例中了解文本提取规则。 WebFoot和Crystal将文本转换为相当于关系数据库条目的正式表示。这是知识发现的必要第一步和自由文本的其他自动分析。

著录项

来源
《National Conferences on Aritificial Intelligence》|1999年||共4页
会议地点
作者
Stephen Soderland;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. A Signal-Representation-Based Parser to Extract Text-Based Information from the Web [J] . Mu-Chun Su, Shao-Jui Wang, Chen-Ko Huang, Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2010,第5a77期

机译：基于信号表示的解析器，用于从Web提取基于文本的信息
2. Learning second language vocabulary: neural dissociation of situation-based learning and text-based learning. [J] . Jeong H, Sugiura M, Sassa Y, NeuroImage . 2010,第2期

机译：学习第二语言词汇：基于情况的学习和基于文本的学习的神经分离。
3. Webmining: learning from the world wide web [J] . Jan Larsen, Lars Kai Hansen, Anna Szymkowiak Have, Computational statistics & data analysis . 2002,第4期

机译：Webmining：从万维网学习
4. Learning to Extract Text-based Information from the World Wide Web [C] . Stephen Soderland National Conferences on Aritificial Intelligence . 1999

机译：学习从万维网中提取基于文本的信息
5. Information-seeking on the World Wide Web: The effects of searching and browsing strategies on navigational patterns and mental models of navigation in the World Wide Web environment. [D] . Chang, Chien-Fu. 2003

机译：万维网上的信息搜索：搜索和浏览策略对万维网环境中的导航模式和导航思维模型的影响。
6. Study protocol for iQuit in Practice: a randomised controlled trial to assess the feasibility acceptability and effectiveness of tailored web- and text-based facilitation of smoking cessation in primary care [O] . Stephen Sutton, Susan Smith, James Jamison, 2013

机译：iQuit在实践中的研究方案：一项随机对照试验旨在评估基于网络和文本的量身定制的初级保健戒烟的可行性可接受性和有效性
7. Evaluating text-based information on the World Wide Web [O] . Wopereis, Iwan G. J. H., van Merrienboer, Jeroen J. G. 2011

机译：在万维网上评估基于文本的信息
8. Learning to Extract Symbolic Knowledge from the World Wide Web [R] . Craven, M. , McCallum, A. , PiPasquo, D. , 1998

机译：学习从万维网中提取符号知识

Learning to Extract Text-based Information from the World Wide Web

摘要

著录项

相似文献

相关主题

期刊订阅