Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction

机译：基于三级训练的命名实体抽取的半监督序列标记：以中文人名抽取为例

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Named entity extraction is a fundamental task for many knowledge engineering applications. Existing studies rely on annotated training data, which is quite expensive when used to obtain large data sets, limiting the effectiveness of recognition. In this research, we propose an automatic labeling procedure to prepare training data from structured resources which contain known named entities. While this automatically labeled training data may contain noise, a self-testing procedure may be used as a follow-up to remove low-confidence annotation and increase the extraction performance with less training data. In addition to the preparation of labeled training data, we also employed semi-supervised learning to utilize large unlabeled training data. By modifying tri-training for sequence labeling and deriving the proper initialization, we can further improve entity extraction. In the task of Chinese personal name extraction with 364,685 sentences (8,672 news articles) and 54,449 (11,856 distinct) person names, an F-measure of 90.4% can be achieved.

机译：命名实体提取是许多知识工程应用程序的基本任务。现有研究依赖于带注释的训练数据，当用于获取大数据集时，训练数据非常昂贵，从而限制了识别的有效性。在这项研究中，我们提出了一种自动标注程序，可以从包含已知命名实体的结构化资源中准备训练数据。尽管此自动标记的训练数据可能包含噪音，但自检过程可用作后续操作，以消除低置信度注释并以较少的训练数据提高提取性能。除了准备带标签的训练数据外，我们还采用半监督学习来利用大量未标记的训练数据。通过修改用于序列标记的三训练并获得适当的初始化，我们可以进一步改善实体提取。在中文姓名提取任务中，使用364,685个句子（8,672个新闻文章）和54,449个（11,856个不同的）人名，可以实现90.4％的F测度。

著录项

来源
《3rd Workshop on semantic web and information extraction》|2014年|33-40|共8页
会议地点 Dublin(IE)
作者
Chien-Lung Chou; Chia-Hui Chang; Shin-Yi Wu;
展开▼
作者单位

National Central University, Taoyuan, Taiwan;

National Central University, Taoyuan, Taiwan;

Industrial Technology Research Institute, Taiwan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Sequence Validation Based Extraction of Named High Cardinality Entities [J] . Khamisi Kalegele, Hideyuki Takahashi, Kazuto Sasai, International Journal of Intelligence Science . 2012,第4期

机译：基于序列验证的命名高基数实体提取
2. A relation extraction method of Chinese named entities based on location and semantic features [J] . Haiguang Li, Xindong Wu, Zhao Li, Applied Intelligence . 2013,第1期

机译：基于位置和语义特征的中文命名实体关系提取方法
3. A relation extraction method of Chinese named entities based on location and semantic features [J] . Li H., Wu X., Li Z., Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2013,第1期

机译：基于位置和语义特征的中文命名实体关系提取方法
4. Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction [C] . Chien-Lung Chou, Chia-Hui Chang, Shin-Yi Wu Workshop on semantic web and information extraction . 2014

机译：基于Tri-Traking的命名实体提取的半监督序列标记：中国人名提取的案例研究
5. Learning for information extraction: From named entity recognition and disambiguation to relation extraction. [D] . Bunescu, Razvan Constantin. 2007

机译：学习信息提取：从命名实体识别和歧义消除到关系提取。
6. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations [O] . Tome Eftimov, Barbara Koroušić Seljak, Peter Korošec -1

机译：基于规则的命名实体识别方法用于基于证据的饮食推荐知识的提取
7. Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction [O] . Chien-lung Chou, Chia-hui Chang, Shin-yi Wu 2015

机译：基于三训练的命名实体提取半监督序列标注：中国人名提取案例研究

Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅