OpenCercs: When Open Information Extraction Meets the Semi-Structured Web

机译：OpenCercs：当开放信息提取遇到半结构化网络时

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Open Information Extraction (OpenlE), the problem of harvesting triples from natural language text whose predicate relations are not aligned to any pre-defined ontology, has been a popular subject of research for the last decade. However, this research has largely ignored the vast quantity of facts available in semi-structured webpages. In this paper, we define the problem of OpenlE from semi-structured websites to extract such facts, and present an approach for solving it. We also introduce a labeled evaluation dataset to motivate research in this area. Given a semi-structured website and a set of seed facts for some relations existing on its pages, we employ a semi-supervised label propagation technique to automatically create training data for the relations present on the site. We then use this training data to learn a classifier for relation extraction. Experimental results of this method on our new benchmark datasel obtained a precision of over 70%. A larger scale extraction experiment on 31 websites in the movie vertical resulted in the extraction of over 2 million triples.

机译：在过去十年中，开放信息提取（OpenlE）是从谓词关系与任何预定义的本体不符的自然语言文本中获取三元组的问题，一直是研究的热门话题。但是，这项研究在很大程度上忽略了半结构化网页中的大量事实。在本文中，我们从半结构化的网站定义了OpenlE的问题，以提取此类事实，并提出了解决该问题的方法。我们还引入了标记的评估数据集，以激励该领域的研究。给定一个半结构化网站并在其页面上存在一些关系的种子事实集，我们采用半监督标签传播技术为该网站上存在的关系自动创建训练数据。然后，我们使用此训练数据来学习用于关系提取的分类器。该方法在我们新的基准数据上的实验结果获得了70％以上的精度。在电影行业的31个网站上进行了较大规模的提取实验，结果提取了超过200万个三元组。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|3047-3056|共10页
会议地点
作者
Colin Lockard; Prashant Shiralkar; Xin Luna Dong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Automatic Extraction of Objects and their Attributes from Semi-Structured Web Tables for E-commerce Tasks [J] . Yerzhan Baiburin, Aliya Nugumanova Indian Journal of Science and Technology . 2015,第30期

机译：从半结构化Web表中自动提取对象及其属性以完成电子商务任务
2. Business information extraction from semi-structured webpages [J] . Nahk Hyun Sung, Yong Sik Chang Expert Systems with Application . 2004,第4期

机译：从半结构化网页中提取业务信息
3. Automatic information extraction from semi-structured Web pages by pattern discovery [J] . Chia-Hui Chang, Chun-Nan Hsu, Shao-Cheng Lui Decision support systems . 2003,第1期

机译：通过模式发现从半结构化网页中自动提取信息
4. OpenCercs: When Open Information Extraction Meets the Semi-Structured Web [C] . Colin Lockard, Prashant Shiralkar, Xin Luna Dong Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：OpencerCs：当打开信息提取时符合半结构化网络
5. Entity information extraction using structured and semi-structured resources. [D] . Sil, Avirup. 2014

机译：使用结构化和半结构化资源提取实体信息。
6. TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes [O] . Dezon K. Finch, James A. McCart, Stephen L. Luther 2014

机译：口号：医疗进度记录中半结构化文本的信息提取
7. WEIDJ: Development of a new algorithm for semi-structured web data extraction [O] . Ily Amalina Ahmad Sabri, Mustafa Man 2021

机译：Weidj：开发新型网络数据提取的新算法

OpenCercs: When Open Information Extraction Meets the Semi-Structured Web

摘要

著录项

相似文献

相关主题

期刊订阅