OLERA: OnLine Extraction Rule Analysis for Semi-structured Documents

机译：olera：半结构文档的在线提取规则分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information extraction (IE) from semi-structured Web documents plays an important role for a variety of information agents. Over the past decade, researchers have developed a rich family of generic IE techniques based on supervised approach which learn extraction rules from user-labelled training examples. However, annotating training data can be expensive when a lot of data sources need to be extracted. In this article, we introduce annotation-free IE using pattern mining and string alignment techniques. We describe OLERA, a semi-supervised IE system that produces extraction rules by aligning similar contents of multiple input records together and presents the result in a spreadsheet-like table. Therefore, users do not need to annotate the input documents but only to specify the scheme for the extracted data after the extraction pattern is discovered. Another plus is that this approach works not only for multi-record Web pages (as a limitation of some unsuper-vised IE approaches) but also single-record Web pages.

机译：来自半结构化Web文档的信息提取（即）对各种信息代理人起着重要作用。在过去的十年中，研究人员已经开发了一种丰富的通用系列，基于监督方法，从用户标记的训练示例中学习提取规则。但是，当需要提取大量数据源时，注释训练数据可能是昂贵的。在本文中，我们使用模式挖掘和串对准技术介绍无注释。我们描述了Olera，一个半监督的IE系统，通过将多个输入记录的类似内容与众不同，并在类似的电子表格表中呈现结果，产生提取规则。因此，用户不需要注释输入文档，而是仅在发现提取模式之后指定提取数据的方案。另一个优点是这种方法不仅适用于多录录网页（作为一些无核心的IE方法的限制），而且是单录网页。

著录项

来源
《IASTED International Multi-conference on Applied Informatics》|2004年||共7页
会议地点
作者
Chia-Hui Chang; Shih-Chien Kuo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
information extraction; semi-structured documents; string alignment; approximate matching;

机译：信息提取;半结构化文件;字符串对齐;近似匹配;

相似文献

外文文献
中文文献
专利

1. Learning element similarity matrix for semi-structured document analysis [J] . Jianwu Yang, William K. Cheung, Xiaoou Chen Knowledge and information systems . 2009,第1期

机译：用于半结构化文档分析的学习元素相似度矩阵
2. Learning element similarity matrix for semi-structured document analysis [J] . Jianwu Yang, William K. Cheung, Xiaoou Chen Knowledge and Information Systems . 2009,第1期

机译：用于半结构化文档分析的学习元素相似度矩阵
3. Exploratory Visual Analysis and Interactive Pattern Extraction from Semi-Structured Data [J] . AXEL J. SOTO, RYAN KIROS, VLADO KESELJ, ACM Transactions on Interactive Intelligent Systems . 2015,第3期

机译：从半结构化数据进行探索性视觉分析和交互式模式提取
4. OLERA: OnLine Extraction Rule Analysis for Semi-structured Documents [C] . Chia-Hui Chang, Shih-Chien Kuo IASTED(International Association of Science and Technology for Development) International Conference on Artificial Intelligence and Applications v.2; 20040216-20040218; Innsbruck; AT . 2004

机译：OLERA：半结构化文档的在线提取规则分析
5. A comparative analysis framework for semi-structured documents, with applications to government regulations. [D] . Lau, Gloria T. 2004

机译：半结构化文档的比较分析框架，适用于政府法规。
6. Cleavage Site Analysis Using Rule Extraction from Neural Networks [O] . Yeun-Jin Cho, Hyeoncheol Kim -1

机译：使用来自神经网络的规则提取进行切割位点分析
7. Generating Association Rules from Semi-Structured Documents Using an Extended Concept Hierarchy [O] . Lisa Singh, Peter Scheuermann, Bin Chen 1997

机译：使用扩展概念层次结构从半结构化文档生成关联规则

OLERA: OnLine Extraction Rule Analysis for Semi-structured Documents

摘要

著录项

相似文献

相关主题

期刊订阅