Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages

机译：在饥饿的印度语言中使用ILP结合使用ILP的语言专业知识

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Developing linguistically sound and data-compliant rules for named entity annotation is usually an intensive and time consuming process for any developer or linguist. In this work, we present the use of two Inductive Logic Programming (ILP) technique s to construct rules for extracting instances of various named entity classes thereby reducing the efforts of a linguist/developer. Using ILP for rule development not only reduces the amount of effort required but also provides an interactive framework wherein a linguist can incorporate his intuition about named entities such as in form of mode declarations for refinements (suitably exposed for ease of use by the linguist) and the background knowledge (in the form of linguistic resources). We have a small amount of tagged data - approximately 3884 sentences for Marathi and 22748 sentences in Hindi. The paucity of tagged data for Indian languages makes manual development of rules more challenging, However, the ability to fold in background knowledge and domain expertise in ILP techniques comes to our rescue and we have been able to develop rules that are mostly linguistically sound that yield results comparable to rules handcrafted by linguists. The ILP approach has two advantages over the approach of hand-crafting all rules: (i) the development time reduces by a factor of 240 when ILP is used instead of involving a linguist for the entire rule development and (ii) the ILP technique has the computational edge that it has a complete and consistent view of all significant patterns in the data at the level of abstraction specified through the mode declarations. The point (ii) enables the discovery of rules that could be missed by the linguist and also makes it possible to scale the rule development to a larger training dataset. The rules thus developed could be optionally edited by linguistic experts and consolidated either (a) through default ordering (as in TILDE[1]) or (b) with an ordering induced using [2] or (c) by using the rules as features in a statistical graphical model such a conditional random field (CRF) [3]. We report results using WARMR [4] and TILDE to learn rules for named entities of Indian languages namely Hindi and Marathi.

机译：开发名为实体注释的语言和数据兼容规则通常是任何开发人员或语言学家的密集和耗时的过程。在这项工作中，我们介绍了两个归纳逻辑编程（ILP）技术S构建用于提取各种命名实体类的实例的规则，从而减少了语言/开发人员的努力。使用ILP进行规则开发不仅可以减少所需的工作量，而且还提供了一个交互式框架，其中语言学家可以包含他的直觉，例如以更新的模式声明形式（适当地暴露于语言学家）和背景知识（以语言资源的形式）。我们有少量标记数据 - Marathi的大约3884个句子和印地语的22748句话。印度语言标记数据的缺乏使得手工制定规则更具挑战性，但是，在ILP技术中折叠背景知识和域专业知识的能力来到我们的救援，我们能够制定大多数语言的规则结果与语言学家手工制作的规则相当。 ILP方法具有两个优点，通过所有规则的手工制作方法：（i）当使用ILP而不是涉及整个规则开发的语言学家而不是涉及整个规则开发的语言学家和（ii）的开发时间，开发时间减少了240倍。计算边缘，它在通过模式声明指定的抽象级别的数据中具有完整且一致的视图。点（ii）可以发现语言学家可能错过的规则，也可以使规则开发扩展到更大的训练数据集。因此，可以通过语言专家选择所开发的规则，并通过默认排序（如图1]）或（b）通过使用[2]或（c）作为特征来诱导的排序来整合（a）在统计图形模型中，这种条件随机场（CRF）[3]。我们通过WALLR [4]和TILDE向结果报告结果，以了解印度语言的命名实体的规则即印度和马拉地赛。

著录项

来源
《International Conference on Inductive Logic Programming》|2010年||共8页
会议地点
作者
Anup Patel; Ganesh Ramakrishnan; Pushpak Bhattacharya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Named Entity Recognition; WARMR; TILDE; ILP;

机译：命名实体识别;plangr;tilde;ILP;

相似文献

外文文献
中文文献
专利

1. A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies [J] . AsifEkbal, SriparnaSaha Expert Systems with Application . 2011,第12期

机译：分类器集成的多目标模拟退火方法：以印度语言中的命名实体识别为案例研究
2. Named Entity Recognition in Indian Languages Using Maximum Entropy Approach [J] . Asif Ekbal, Sivaji Bandyopadhyay International journal of computer processing of languages . 2008,第3期

机译：使用最大熵方法的印度语言中的命名实体识别
3. Generic Feature Selection Methodology to Named Entity Detection from Indian and European Languages [J] . MALARKODI C. S, DEVI S. L. Advances in Electrical and Computer Engineering . 2019,第1期

机译：印度和欧洲语言中命名实体检测的通用特征选择方法
4. Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages [C] . Anup Patel, Ganesh Ramakrishnan, Pushpak Bhattacharya International Conference on Inductive Logic Programming . 2010

机译：在饥饿的印度语言中使用ILP结合使用ILP的语言专业知识
5. A data-intensive approach to named entity recognition using domain and language independent methods [D] . Osesina, Olukayode Isaac. 2010

机译：使用领域和语言无关的方法进行的数据密集型命名实体识别方法
6. A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records [O] . Xiaoling Cai, Shoubin Dong, Jinlong Hu 2019

机译：结合语音和自我匹配注意力的深度学习模型用于中国电子病历的命名实体识别
7. Survey of Named Entity Recognition Techniques for Various Indian Regional Languages [O] . Shrutika Kale, Sharvari Govilkar 2017

机译：各种印度区域语言的名称实体识别技术调查

Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages

摘要

著录项

相似文献

相关主题

期刊订阅