Information Extraction in Structured Documents Using Tree Automata Induction

机译：使用Tree Automata Incuction结构提取结构提取

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such documents have a tree structure. Hence it is natural to investigate methods that are able to recognise and exploit this tree structure. We do this by exploring the use of tree automata for IE in structured documents. Experimental results on benchmark data sets show that our approach compares favorably with previous approaches.

机译：信息提取（IE）解决了从文件集合中提取特定信息的问题。从HTML或XML中格式化的结构化文档的前面的大部分工作都使用来自字符串的技术，例如语法和自动机等。但是，这些文件具有树结构。因此，研究能够识别和利用这种树结构的方法是自然的。我们通过探索在结构化文件中使用树自动机的使用来实现这一点。基准数据集的实验结果表明，我们的方法与先前的方法有利。

著录项

来源
《European Conference on Principles of Data Mining and Knowledge Discovery》|2002年||共12页
会议地点
作者
Raymond Kosala; Jan Van den Bussche; Maurice Bruynooghe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Tree Automata for Extracting Consensus from Partial Replicas of a Structured Document [J] . Maurice Tchoupé Tchendji, Milliam M. Zekeng Ndadji Journal of Software Engineering and Applications . 2017,第5期

机译：从结构化文档的部分副本中提取共识的树自动机
2. Information extraction from structured documents using k-testable tree automaton inference [J] . Raymond Kosala, Hendrik Blockeel, Maurice Bruynooghe, Data & Knowledge Engineering . 2006,第2期

机译：使用k可测树自动机推理从结构化文档中提取信息
3. A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents Clustering [J] . Issam SAHMOUDI, Hanane FROUD, Abdelmonaime LACHKAR International Journal of Database Management Systems . 2013,第6期

机译：基于后缀树数据结构的阿拉伯语文档聚类新关键词提取方法
4. Information Extraction in Structured Documents Using Tree Automata Induction [C] . Raymond Kosala, Jan Van den Bussche, Maurice Bruynooghe, 6th European Conference on Principles of Data Mining and Knowledge Discovery PKDD 2002, Aug 19-23, 2002, Helsinki, Finland . 2002

机译：使用树自动机归纳法提取结构化文档中的信息
5. Leveraging knowledge of document structure and named entities for information extraction. [D] . Duncan, Frank Bissett. 2005

机译：利用文档结构和命名实体的知识进行信息提取。
6. Integer programming-based method for grammar-based tree compression and its application to pattern extraction of glycan tree structures [O] . Yang Zhao, Morihiro Hayashida, Tatsuya Akutsu 2010

机译：基于整数编程的基于树的树压缩方法及其在聚糖树结构模式提取中的应用
7. Information extraction in structured documents using tree automata induction [O] . Kosala Raymondus, Van den Bussche Jan, Bruynooghe Maurice, 2002

机译：使用树自动机归纳法提取结构化文档中的信息

Information Extraction in Structured Documents Using Tree Automata Induction

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅