首页> 外文期刊>ACM transactions on Asian language information processing >A Named Entity Recognition Method Based on Decomposition and Concatenation of Word Chunks
【24h】

A Named Entity Recognition Method Based on Decomposition and Concatenation of Word Chunks

机译:基于词块分解与级联的命名实体识别方法

获取原文
获取原文并翻译 | 示例

摘要

We propose a named entity (NE) recognition method in which word chunks are repeatedly decomposed and concatenated. Our method identifies word chunks with a base chunker, such as a noun phrase chunker, and then recognizes NEs from the recognized word chunk sequences. By using word chunks, we can obtain features that cannot be obtained in word-sequence-based recognition methods, such as the first word of a word chunk, the last word of a word chunk, and so on. However, each word chunk may include a part of an NE or multiple NEs. To solve this problem, we use the following operators: SHIFT for separating the first word from a word chunk, POP for separating the last word from a word chunk, JOIN for concatenating two word chunks, and REDUCE for assigning an NE label to a word chunk. We evaluate our method on a Japanese NE recognition dataset that includes about 200,000 annotations of 191 types of NEs from over 8,500 news articles. The experimental results show that the training and processing speeds of our method are faster than those of a linear-chain structured perceptron and a semi-Markov perceptron, while maintaining high accuracy.
机译:我们提出了一种命名实体(NE)识别方法,其中单词块被反复分解和连接。我们的方法使用基本词块(例如名词短语词块)识别词块,然后从识别的词块序列中识别NE。通过使用单词块,我们可以获得在基于单词序列的识别方法中无法获得的功能,例如单词块的第一个单词,单词块的最后一个单词等。然而,每个单词块可以包括一个NE或多个NE的一部分。为解决此问题,我们使用以下运算符:SHIFT用于将第一个单词与单词块分离; POP用于将最后一个单词与单词块分离; JOIN用于连接两个单词块;以及REDUCE用于将NE标签分配给单词块。我们在日本NE识别数据集上评估了我们的方法,该数据集包含来自8,500多个新闻文章的191种NE的大约200,000个注释。实验结果表明,我们的方法的训练和处理速度比线性链结构感知器和半马尔可夫感知器要快,同时保持了较高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号