FactRunner: A New System for NLP-Based Information Extraction from Wikipedia

机译：FactRunner：从Wikipedia提取基于NLP的信息的新系统

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Wikipedia is playing an increasing role as a source of human-readable knowledge, because it contains an enormous amount of high quality information written by human authors. Finding a relevant piece of information in this huge collection of natural language text is often a time-consuming process, as a keyword-based search interface is the main method for querying. Therefore, an iterative process to explore the document collection to find the information of interest is required. In this paper, we present an approach to extract structured information from unstructured documents to enable structured queries. Information Extraction (IE) systems have been proposed for this tasks, but due to the complexity of natural language, they often produce unsatisfying results. As Wikipedia contains, in addition to the plain natural language text, links between documents and other metadata, we propose an approach which exploits this information to extract more accurate structured information. Our proposed system FactRunner focusses on extracting structured information from sentences containing such links, because the links may indicate more accurate information than other sentences. We evaluated our system with a subset of documents from Wikipedia and compared the results with another existing system. The results show that a natural language parser combined with Wikipedia markup can be exploited for extracting facts in form of triple statements with a high accuracy.

机译：维基百科作为人类可读知识的来源，正发挥着越来越重要的作用，因为它包含了大量由人类撰写的高质量信息。在庞大的自然语言文本集中查找相关信息通常是一个耗时的过程，因为基于关键字的搜索界面是查询的主要方法。因此，需要一个迭代的过程来探索文档集合以找到感兴趣的信息。在本文中，我们提出了一种从非结构化文档中提取结构化信息以启用结构化查询的方法。已经提出了用于此任务的信息提取（IE）系统，但是由于自然语言的复杂性，它们通常会产生不令人满意的结果。由于Wikipedia除了普通的自然语言文本外，还包含文档和其他元数据之间的链接，因此我们提出了一种利用此信息来提取更准确的结构化信息的方法。我们提出的系统FactRunner专注于从包含此类链接的句子中提取结构化信息，因为这些链接可能比其他句子表示更准确的信息。我们使用Wikipedia的一部分文档评估了我们的系统，并将结果与另一个现有系统进行了比较。结果表明，可以结合使用自然语言解析器和Wikipedia标记来以三重语句的形式高精度提取事实。

著录项

来源
《International conference on web information systems and technologies》|2014年|225-240|共16页
会议地点
作者
Rhio Sutoyo; Christoph Quix; Fisnik Kastrati;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Information extraction; Semantic search;

机译：信息提取;语义搜索;

相似文献

外文文献
中文文献
专利

1. Semantic NLP-Based Information Extraction from Construction Regulatory Documents for Automated Compliance Checking [J] . Zhang Jiansong, El-Gohary Nora M. Journal of Computing in Civil Engineering . 2016,第2期

机译：基于语义NLP的施工法规文件中的信息提取，用于自动合规性检查
2. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach [J] . Zorana Ratkovic, Wiktoria Golik, Pierre Warnier BMC Bioinformatics . 2012,第SUPPLEMENTa11期

机译：细菌生物纤维的事件提取：一种知识密集型的基于NLP的方法
3. WHAD: Wikipedia historical attributes data Historical structured data extraction and vandalism detection from the Wikipedia edit history [J] . Enrique Alfonseca, Guillermo Garrido, Jean-Yves Delort, Language Resources and Evaluation . 2013,第4期

机译：WHAD：Wikipedia历史属性数据历史数据结构化数据提取和Wikipedia编辑历史中的恶意破坏检测
4. FactRunner: A New System for NLP-Based Information Extraction from Wikipedia [C] . Rhio Sutoyo, Christoph Quix, Fisnik Kastrati International conference on web information systems and technologies . 2014

机译：FacTrunner：Wikipedia的基于NLP的信息提取的新系统
5. Entity Extraction and Disambiguation in Short Text Using Wikipedia and Semantic User Profiles. [D] . Zendejas, Ignacio. 2014

机译：使用Wikipedia和语义用户配置文件在短文本中提取和消除歧义。
6. Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach [O] . Zorana Ratkovic, Wiktoria Golik, Pierre Warnier 2012

机译：细菌生物群落的事件提取：基于知识的基于NLP的方法
7. NLP-based Extraction of Modificatory Provisions Semantics [O] . Alessandro Mazzei, Daniele P. Radicioni, Raffaella Brighi 2013

机译：基于NLP的修饰语语义提取
8. SAWUS: Siena's Automatic Wikipedia Update System. [R] . Tompkins, C., Witter, Z., Small, S. G. 2012

机译：saWUs：锡耶纳的自动维基百科更新系统。

FactRunner: A New System for NLP-Based Information Extraction from Wikipedia

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅