Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction

机译：使用Web减少基于模式的信息提取中的数据稀疏性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclopedic works or scientific databases. We present results on applying a weakly supervised pattern induction algorithm to Wikipedia to extract instances of arbitrary relations. In particular, we apply different configurations of a basic algorithm for pattern induction on seven different datasets. We show that the lack of redundancy leads to the need of a large amount of training data but that integrating Web extraction into the process leads to a significant reduction of required training data while maintaining the accuracy of Wikipedia. In particular we show that, though the use of the Web can have similar effects as produced by increasing the number of seeds, it leads overall to better results. Our approach thus allows to combine advantages of two sources: The high reliability of a closed corpus and the high redundancy of the Web.

机译：文本模式已有效地用于从大型文本集合中提取信息。然而，他们在很大程度上严重依赖于文本冗余，因为必须以类似的方式提及事实，以便概括为文本模式。因此，数据稀疏性在尝试从公司内联网，百科全书或科学数据库等冗余源中提取信息时成为问题。我们提出了将弱监督模式诱导算法应用于维基百科以提取任意关系的实例。特别是，我们在七个不同的数据集上应用了不同的模式感应算法的不同配置。我们表明，缺少冗余引线的需要大量的训练数据，但该网站提取融入过程导致一个显著减少所需的训练数据，同时维持维基百科的准确性。特别是我们展示了，尽管使用Web的使用可以通过增加种子的数量产生类似的效果，但它总体上导致更好的结果。因此，我们的方法允许将两个来源的优点结合起来：封闭的语料库的高可靠性和网络的高冗余。

著录项

来源
《European Conference on Principles and Practice of Knowledge Discovery in Databases》|2007年||共12页
会议地点
作者
Sebastian Blohm; Philipp Cimiano;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-532;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient learning algorithm for sparse subsequence pattern-based classification and applications to comparative animal trajectory data analysis [J] . Sakuma Takuto, Nishi Kazuya, Kishimoto Kaoru, Advanced Robotics: The International Journal of the Robotics Society of Japan . 2019,第3a4期

机译：基于稀疏后续模式的分类和应用的高效学习算法对比较动物轨迹数据分析
2. Common spatial pattern-based feature extraction from the best time segment of BCI data [J] . ?NDER AYDEM?R Turkish Journal of Electrical Engineering and Computer Sciences . 2016,第5期

机译：从BCI数据的最佳时间段提取基于公共空间模式的特征
3. Increasing value and reducing waste in data extraction for systematic reviews: tracking data in data extraction forms [J] . Farhad Shokraneh, Clive E. Adams Systematic Reviews . 2017,第1期

机译：进行系统审查的数据提取中的增值和减少浪费：以数据提取形式跟踪数据
4. Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction [C] . Sebastian Blohm, Philipp Cimiano European Conference on Principle and Practice of Knowledge Discovery in Databases; 20070917-21; Warsaw(PL) . 2007

机译：使用网络减少基于模式的信息提取中的数据稀疏性
5. Design and Development of Intelligent Web Mining System for Extraction of Information from Web Databases [D] . Sharma, Sanjeev Kumar. 2010

机译：Web数据库提取信息的智能网络挖掘系统的设计与开发
6. Response to ‘Increasing value and reducing waste in data extraction for systematic reviews: tracking data in data extraction forms’ [O] . Jens Jap, Ian J. Saldanha, Bryant T. Smith, 2018

机译：对在系统审查中提高价值并减少数据提取中的浪费：跟踪数据提取表中的数据的回应
7. Using the Web to Reduce Data Sparseness in Pattern-based Information Extraction [O] . Sebastian Blohm 2007

机译：利用Web减少基于模式的信息抽取中的数据稀疏性

Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction

摘要

著录项

相似文献

相关主题

期刊订阅