A Modular Approach for Social Media Text Normalization

机译：一种模块化的社交媒体文本规范方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The normalized data is the backbone of various Natural Language Processing (NLP), Information Retrieval (IR), data mining, and Machine Translation (MT) applications. Thus, we propose an approach to normalize the colloquial and breviate text being posted on the social media like Twitter, Facebook, etc. The proposed approach for text normalization is based upon Levenshtein distance, demetaphone algorithm, and dictionary mappings. The standard dataset named lexnorm 1.2, containing English tweets is used to validate the proposed modular approach. Experimental results are compared with existing unsupervised approaches. It has been found that modular approach outperforms other exploited normalization techniques by achieving 83.6% of precision, recall, and F-scores. Also 91.1% of BLUE scores have been achieved.

机译：归一化数据是各种自然语言处理（NLP）的骨干，信息检索（IR），数据挖掘和机器翻译（MT）应用程序。因此，我们提出了一种方法来规范化剧本和短语文本，如Twitter，Facebook等。所提出的文本规范化方法是基于Levenshtein距离，demetaphone算法和字典映射。标准数据集命名为Lexnorm 1.2，包含英文推文用于验证所提出的模块化方法。将实验结果与现有无监督的方法进行比较。已经发现，模块化方法通过达到83.6％的精度，召回和F分数来实现其他利用的归一化技术。还实现了91.1％的蓝色分数。

著录项

来源
《International Conference on Frontiers of Intelligent Computing : Theory and Applications》|2018年|xx 583 pages :|共9页
会议地点
作者
Palak Rehan; Mukesh Kumar; Sarbjeet Singh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.4-532;
关键词
backbone; various; Natural;

机译：骨干;各种;自然;
入库时间 2022-08-21 12:16:38

相似文献

外文文献
中文文献
专利

1. Machine Normalization: Bringing Social Media Text from Non-Standard to Standard Form [J] . Zarnoufi Randa, Jaafar Hamid, Abik Mounia ACM transactions on Asian and low-resource language information processing . 2020,第4期

机译：机器归一化：将非标准的社交媒体文本带到标准形式
2. Context-sensitive normalization of social media text in bahasa Indonesia based on neural word embeddings [J] . Renny Pradina Kusumawardani, Stezar Priansya, Faizal Johan Atletiko Procedia Computer Science . 2018,第22期

机译：基于神经词嵌入的印尼巴哈萨语社交媒体文本的上下文相关标准化
3. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task [J] . Abeed Sarker, Maksim Belousov, Jasper Friedrichs, Journal of the American Medical Informatics Association : . 2018,第10期

机译：Twitter的药物相关文本分类和概念标准化的数据和系统：来自社交媒体挖掘的洞察力（SMM4H） - 2017年共享任务
4. A Modular Approach for Social Media Text Normalization [C] . Palak Rehan, Mukesh Kumar, Sarbjeet Singh International Conference on Frontiers of Intelligent Computing : Theory and Applications . 2018

机译：社交媒体文本规范化的模块化方法
5. Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-Aware Data Curation Process and a Hybrid Approach [D] . Liu, Kun. 2021

机译：利用社交媒体文本融入了词汇语言学分析的失语单词 - OOV感知数据策委和混合方法
6. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task [O] . Abeed Sarker, Maksim Belousov, Jasper Friedrichs, 2018

机译：Twitter上与药物有关的文本分类和概念归一化的数据和系统：来自社交媒体健康促进会（SMM4H）-2017的共享任务的见解
7. A Cascaded Approach for Social Media Text Normalization of Turkish [O] . 2015

机译：土耳其语社交媒体文本规范化的级联方法

A Modular Approach for Social Media Text Normalization

摘要

著录项

相似文献

相关主题

期刊订阅