Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining

Cesar de Pablo-Sanchez; Isabel Segura-Bedmar; Paloma Martinez; Ana Iglesias-Maqueda

首页> 外文期刊>Knowledge and information systems >Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining

【24h】

Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining

机译：在多语言文本挖掘中轻监督下获取命名实体和语言模式

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named Entity Recognition and Classification (NERC) is an important component of applications like Opinion Tracking, Information Extraction, or Question Answering. When these applications require to work in several languages, NERC becomes a bottleneck because its development requires language-specific tools and resources like lists of names or annotated corpora. This paper presents a lightly supervised system that acquires lists of names and linguistic patterns from large raw text collections in western languages and starting with only a few seeds per class selected by a human expert. Experiments have been carried out with English and Spanish news collections and with the Spanish Wikipedia. Evaluation of NE classification on standard datasets shows that NE lists achieve high precision and reveals that contextual patterns increase recall significantly. Therefore, it would be helpful for applications where annotated NERC data are not available such as those that have to deal with several western languages or information from different domains.

机译：命名实体识别和分类（NERC）是诸如意见跟踪，信息提取或问题解答之类的应用程序的重要组成部分。当这些应用程序需要使用多种语言工作时，NERC成为瓶颈，因为它的开发需要特定于语言的工具和资源，例如名称列表或带注释的语料库。本文提出了一个受轻微监督的系统，该系统从西方语言的大量原始文本集合中获取名称和语言模式的列表，并且从人类专家选择的每个类别中仅获取少量种子开始。已经对英语和西班牙语新闻集以及西班牙语维基百科进行了实验。在标准数据集上对网元分类的评估表明，网元列表实现了较高的准确性，并表明上下文模式显着提高了召回率。因此，这对于不提供带注释的NERC数据的应用程序（例如那些必须处理几种西方语言或来自不同域的信息的应用程序）将很有帮助。

著录项

来源
《Knowledge and information systems 》 |2013年第1期| 共23页
作者
Cesar de Pablo-Sanchez; Isabel Segura-Bedmar; Paloma Martinez; Ana Iglesias-Maqueda;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统理论 ;
关键词
Named entity recognition and categorization; Information extraction; Multilingual natural language processing; Bootstrapping algorithms;

机译：命名实体的识别和分类;信息提取;多语言自然语言处理;自举算法;

相似文献

外文文献
中文文献
专利

1. Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining [J] . Cesar de Pablo-Sanchez, Isabel Segura-Bedmar, Paloma Martinez, Knowledge and information systems . 2013 ,第1期

机译：在多语言文本挖掘中轻监督下获取命名实体和语言模式
2. TwiSNER: Semi-supervised Method for Named Entity Recognition from Text Streams on Twitter [J] . Van Cuong Tran, Dosam Hwang, Jason J. Jung Journal of Universal Computer Science . 2016 ,第6期

机译：TwiSNER：从Twitter上的文本流中识别实体的半监督方法
3. Supervised semantic relation mining from linguistically noisy text documents [J] . Cristina Giannone, Roberto Basili, Paolo Naggar, International Journal on Document Analysis and Recognition . 2011 ,第2期

机译：从语言嘈杂的文本文档中监督语义关系挖掘
4. Pattern Acquisition for Chinese Named Entity Recognition: A Supervised Learning Approach [C] . Xiaoshan Fang, Huanye Sheng . 2002

机译：中文命名实体识别的模式获取：一种监督学习方法
5. Multilingual named entity extraction and translation from text and speech. [D] . Huang, Fei. 2006

机译：多语言命名实体从文本和语音中提取和翻译。
6. Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models [O] . Jay Urbain -1

机译：利用命名实体识别和分布语义模型挖掘临床文本中的心脏病危险因素
7. Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining [O] . Pablo-Sánchez César de, Segura Bedmar Isabel, Martínez Paloma, 2012

机译：轻度监督获取多语言文本挖掘的命名实体和语言模式

Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining

摘要

著录项

相似文献

相关主题

期刊订阅