首页> 外文会议>International Conference on Advanced Informatics: Concept Theory and Applications >Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus
【24h】

Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus

机译:带有不连续标注语料库的泰语命名实体识别模型

获取原文

摘要

In the Thai language, named entity can be used with or without a prefix or an indication of word. This may cause confusion between named entity and other types of noun. However, a named entity is likely to be used in adjacent to verbs or prepositions. This means that the adjacent verbs or prepositions to a noun can be as a good feature to determine the type of named entity. There are some studies on named entity recognition (NER) task in other languages such as Indonesian showing that combination of word embedding and part-of-speech (POS) tag can improve the performance of the NER model. In this paper, we investigate the Thai Named Entity Recognition task using Bi-LSTM model with word embedding and POS embedding for dealing with the relatively small and disjointedly labeled corpus. We compare our model with the one without POS tag, and the baseline model of CRF with the similar set of feature. The experiment results show that our proposed model outperforms the other two in all F1-score measures. Especially, in the case of location file, the F1-score is increased by 14 percent.
机译:在泰语中,可以使用带或不带前缀或单词指示的命名实体。这可能导致命名实体与其他类型的名词之间的混淆。但是,命名实体很可能会与动词或介词相邻使用。这意味着名词的相邻动词或介词可以作为确定命名实体类型的好特征。对其他语言(例如印尼语)的命名实体识别(NER)任务进行了一些研究,结果表明,词嵌入和词性(POS)标签的组合可以提高NER模型的性能。在本文中,我们使用带有词嵌入和POS嵌入的Bi-LSTM模型研究泰语命名实体识别任务,以处理相对较小且不连贯的语料库。我们将我们的模型与没有POS标签的模型和具有相似功能集的CRF的基线模型进行比较。实验结果表明,我们提出的模型在所有F1评分指标上均优于其他两个。特别是对于位置文件,F1分数增加了14%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号