Text normalization for named entity recognition in Vietnamese tweets

Vu H. Nguyen; Hien T. Nguyen; Vaclav Snasel

首页> 外文期刊>Computational Social Networks >Text normalization for named entity recognition in Vietnamese tweets

【24h】

Text normalization for named entity recognition in Vietnamese tweets

机译：越南推文中用于命名实体识别的文本规范化

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract Background Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes, such as person, location, and organization. This paper focuses on tweets posted on Twitter. Since tweets are noisy, irregular, brief, and include acronyms and spelling errors, NER in those tweets is a challenging task. Many approaches have been proposed to deal with this problem in tweets written in English, Germany, Chinese, etc., but none for Vietnamese tweets. Methods We propose a method that normalizes a tweet before taking as an input of a learning model for NER in Vietnamese tweets. The normalization step detects spelling errors in a tweet and corrects them using an improved Dice's coefficient or n-grams. A Support Vector Machine learning algorithm is employed to learn a classifier using six different types of features. Results and Conclusion We train our method on a training set consisting of more than 40,000 named entities and evaluate it on a testing set consisting of 3,186 named entities. The experimental results showed that our system achieves state-of-the-art performance with F1 score of 82.13%.

机译：背景技术命名实体识别（NER）是一项任务，用于检测文档中的命名实体并将其分类为预定义的类，例如人，位置和组织。本文重点关注Twitter上发布的推文。由于推文嘈杂，不规则，简短，并且包含首字母缩写词和拼写错误，因此这些推文中的NER是一项艰巨的任务。在以英语，德语，中文等语言编写的推文中，已经提出了许多方法来解决此问题，但对于越南推文则没有。方法我们提出了一种在将越南语推文中的NER学习模型作为输入之前，将推文规范化的方法。规范化步骤可检测推文中的拼写错误，并使用改进的Dice系数或n-gram对其进行纠正。支持向量机学习算法用于学习使用六种不同类型特征的分类器。结果与结论我们在由40,000多个命名实体组成的训练集中训练我们的方法，并在由3,186个命名实体组成的测试集中对方法进行评估。实验结果表明，我们的系统具有F1分数达82.13％的最先进性能。

著录项

来源
《Computational Social Networks》 |2016年第1期|共16页
作者
Vu H. Nguyen; Hien T. Nguyen; Vaclav Snasel;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Improving Named Entity Recognition in Vietnamese Texts by a Character-Level Deep Lifelong Learning Model [J] . Ngoc-Vu Nguyen, Thi-Lan Nguyen, Cam-Van Nguyen Thi, Vietnam Journal of Computer Science . 2019,第4期

机译：通过角色级深终身学习模型改善越南文本中的命名实体识别
2. Why pay more? A simple and efficient named entity recognition system for tweets [J] . Suman Chanchal, Reddy Saichethan Miriyala, Saha Sriparna, Expert systems with applications . 2021,第Apra期

机译：为什么要付出更多？用于推文的简单有效的名为实体识别系统
3. Named-Entity Recognition on Indonesian Tweets using Bidirectional LSTM-CRF [J] . Deni Cahya Wintaka, Moch Arif Bijaksana, Ibnu Asror Procedia Computer Science . 2019,第11期

机译：使用双向LSTM-CRF的印度尼西亚推文上的命名实体识别
4. Named entity recognition and normalization in tweets towards text summarization [C] . Jabeen Saima, Shah Sajid, Latif Asma International Conference on Digital Information Management . 2013

机译：推文中的命名实体识别和规范化，用于文本摘要
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Text normalization for named entity recognition in Vietnamese tweets [O] . Vu H. Nguyen, Hien T. Nguyen, Vaclav Snasel -1

机译：越南推文中用于命名实体识别的文本规范化
7. Text normalization for named entity recognition in Vietnamese tweets [O] . 2016

机译：越南推文中用于命名实体识别的文本规范化

Text normalization for named entity recognition in Vietnamese tweets

摘要

著录项

相似文献

相关主题

期刊订阅