Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus

机译：带有不连续标注语料库的泰语命名实体识别模型

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In the Thai language, named entity can be used with or without a prefix or an indication of word. This may cause confusion between named entity and other types of noun. However, a named entity is likely to be used in adjacent to verbs or prepositions. This means that the adjacent verbs or prepositions to a noun can be as a good feature to determine the type of named entity. There are some studies on named entity recognition (NER) task in other languages such as Indonesian showing that combination of word embedding and part-of-speech (POS) tag can improve the performance of the NER model. In this paper, we investigate the Thai Named Entity Recognition task using Bi-LSTM model with word embedding and POS embedding for dealing with the relatively small and disjointedly labeled corpus. We compare our model with the one without POS tag, and the baseline model of CRF with the similar set of feature. The experiment results show that our proposed model outperforms the other two in all F1-score measures. Especially, in the case of location file, the F1-score is increased by 14 percent.

机译：在泰语中，可以使用带或不带前缀或单词指示的命名实体。这可能导致命名实体与其他类型的名词之间的混淆。但是，命名实体很可能会与动词或介词相邻使用。这意味着名词的相邻动词或介词可以作为确定命名实体类型的好特征。对其他语言（例如印尼语）的命名实体识别（NER）任务进行了一些研究，结果表明，词嵌入和词性（POS）标签的组合可以提高NER模型的性能。在本文中，我们使用带有词嵌入和POS嵌入的Bi-LSTM模型研究泰语命名实体识别任务，以处理相对较小且不连贯的语料库。我们将我们的模型与没有POS标签的模型和具有相似功能集的CRF的基线模型进行比较。实验结果表明，我们提出的模型在所有F1评分指标上均优于其他两个。特别是对于位置文件，F1分数增加了14％。

著录项

来源
《International Conference on Advanced Informatics: Concept Theory and Applications》|2018年|30-35|共6页
会议地点 Krabi(TH)
作者
Kitiya Suriyachay; Virach Sornlertlamvanich;
展开▼
作者单位

School of ICT Sirindhorn International Institute of Technology Thammasat University Thailand;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Predictive models; Task analysis; Organizations; Recurrent neural networks; Training; Data models;

机译：隐藏的马尔可夫模型；预测模型；任务分析；组织；递归神经网络；训练;资料模型;

相似文献

外文文献
中文文献
专利

1. Myanmar named entity corpus and its use in syllable-based neural named entity recognition [J] . Hsu Myat Mo, Khin Mar Soe International Journal of Electrical and Computer Engineering . 2020,第2期

机译：缅甸名为实体语料库及其在基于音节的神经名为实体识别中的用途
2. DTranNER: biomedical named entity recognition with deep learning-based label-label transition model [J] . S. K. Hong, Jae-Gil Lee BMC Bioinformatics . 2020,第1期

机译：DTRANNER：生物医学命名实体识别与基于深度学习的标签标签转换模型
3. Joint Pre-Trained Chinese Named Entity Recognition Based on Bi-Directional Language Model [J] . Ma Changxia, Zhang Chen International Journal of Pattern Recognition and Artificial Intelligence . 2021,第9期

机译：基于双向语言模型的联合预先培训的中文命名实体识别
4. Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus [C] . Kitiya Suriyachay, Virach Sornlertlamvanich International Conference of Advanced Informatics: Concept, Theory and Application . 2018

机译：从脱节标记的语料库中命名为泰语语言的实体识别建模
5. Arabic Named Entity Recognition: A Corpus-Based Study [D] . Algahtani, Shabib. 2012

机译：阿拉伯语命名实体识别：基于语料库的研究
6. Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for Named-Entity Recognition Using Embeddings from Language Models Representations [O] . Min Zhang, Guohua Geng, Jing Chen 2020

机译：使用语言模型表示的嵌入式识别命名实体识别的半监控双向短期内存和条件随机字段模型
7. Medical Named Entity Recognition from Un-labelled Medical Records based on Pre-trained Language Models and Domain Dictionary [O] . Chaojie Wen, Tao Chen, Xudong Jia, 2021

机译：医疗名为实体识别来自未标记的医疗记录，基于预先训练的语言模型和域字典

Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅