Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages

机译：从多域维基百科页面自动学习常见的定义模式

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Automatic definition extraction has attracted wide interest in NLP domain and knowledge-based applications. One primary task of definition extraction is mining patterns from definitional sentences. Existing extraction methods of definitional patterns, either focus on manual extraction by intuition or observation, or aim to mine intricate definitional patterns by automatic extraction methods. The manual method requires large human resources to identify the definitional patterns because of diverse lexico-syntactic structures. It inevitable suffers poor behavior especially the extraction from cross-domain corpora. The latter method mainly considers the precision in definition extraction, which is at the cost of decreasing the recall of definitions. Both of them are unsuitable for cross-domain definition extraction. To address those issues, this paper proposes a solution to perform the automatic extraction of definitional patterns from multi-domain definitional sentences of Wikipedia. Our method FIND-SS is modified based on FIND-S algorithm and solves the definition extraction problems of cross-domain corpora. Find-SS adopts a "the more similar the higher priority" scheme to improve the learning performance. It can accommodate some noisy information and does not require any pattern seeds for pattern learning. The experimental results indicate that our scenario is significantly superior to previous method.

机译：自动定义提取在NLP域和基于知识的应用程序中引起了广泛的兴趣。定义提取的一项主要任务是从定义语句中挖掘模式。现有的定义模式提取方法，要么专注于通过直觉或观察进行手动提取，要么旨在通过自动提取方法来挖掘复杂的定义模式。由于多种多样的词汇-句法结构，手动方法需要大量的人力资源来确定定义模式。它不可避免地遭受不良行为，特别是从跨域语料库中提取。后一种方法主要考虑定义提取的精度，其代价是减少了定义的调用。它们都不适合跨域定义提取。为了解决这些问题，本文提出了一种从Wikipedia的多域定义语句中自动提取定义模式的解决方案。我们的方法FIND-SS是在FIND-S算法的基础上进行修改的，解决了跨域语料库的定义提取问题。 Find-SS采用“越相似，优先级越高”的方案来提高学习性能。它可以容纳一些嘈杂的信息，并且不需要任何模式种子即可进行模式学习。实验结果表明，我们的方案明显优于以前的方法。

著录项

来源
《IEEE International Conference on Data Mining Workshops》|2014年|251-258|共8页
会议地点
作者
Jingsong Zhang; Yinglin Wang; Dingyu Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Web sites; data mining; feature extraction; knowledge based systems; learning (artificial intelligence); natural language processing; NLP; Wikipedia page; automatic definition extraction; cross-domain corpora; knowledge-based application; lexico-syntactic structure; natural language processing; pattern learning; pattern mining; Electronic publishing; Encyclopedias; Internet; Training; Upper bound; Vectors; FIND-S algorithm; definition extraction; definitional pattern; frequent pattern; similarity priority;

机译：网站;数据挖掘;特征提取;基于知识的系统;学习（人工智能）;自然语言处理; NLP;维基百科页面;自动定义提取;跨域语料库;基于知识的应用程序;词汇语法结构;自然语言处理模式学习;模式挖掘;电子出版;百科全书;互联网;培训;上限;向量; FIND-S算法;定义提取;定义模式;常见模式;相似度优先;

相似文献

外文文献
中文文献
专利

1. Automatising the learning of lexical patterns: An application to the enrichment of WordNet by extracting semantic relationships from Wikipedia [J] . Maria Ruiz-Casado, Enrique Alfonseca, Pablo Castells Data & Knowledge Engineering . 2007,第3期

机译：自动学习词汇模式：通过从Wikipedia提取语义关系来丰富WordNet的应用
2. Automatically extracted parallel corpora enriched with highly useful metadata? A Wikipedia case study combining machine learning and social technology [J] . Aghaebrahimian Ahmad, Stauder Andy, Ustaszewski Michael Digital scholarship in the humanities . 2021,第1期

机译：自动提取富有非常有用的元数据的并行语料库？机器学习与社会技术的维基百科案例研究
3. Recognizing molecular patterns by machine learning: An agnostic structural definition of the hydrogen bond [J] . Gasparotto Piero, Ceriotti Michele The Journal of Chemical Physics . 2014,第17期

机译：通过机器学习识别分子模式：氢键的不可知结构定义
4. Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia [C] . Maria Ruiz-Casado, Enrique Alfonseca, Pablo Castells International Conference on Applications of Natural Language to Information Systems(NLDB 2005); 20050615-17; Alicante(ES) . 2005

机译：通过Wikipedia的模式学习自动提取WordNet的语义关系。
5. Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech. [D] . Chen, Barry Yue. 2005

机译：学习可分辨的窄带时间模式，以自动识别会话电话语音。
6. Evaluation of Three Machine Learning Algorithms for the Automatic Classification of EMG Patterns in Gait Disorders [O] . Christopher Fricke, Jalal Alizadeh, Nahrin Zakhary, 2021

机译：三种机器学习算法的步态障碍态度自动分类的评价
7. Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia [O] . Ruiz-Casado, María, Alfonseca Cubero, Enrique, Castells, Pablo 2005

机译：通过维基百科的模式学习自动提取WordNet的语义关系
8. Hotspot Patterns: The Formal Definition and Automatic Detection of Architecture Smells. [R] . Mo, R., Cai, Y., Kazman, R., 2015

机译：热点模式：建筑气味的形式定义和自动检测。

Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅