Word Level Language Identification of Code Mixing Text in Social Media using NLP

机译：使用NLP的社交媒体中代码混合文本的单词级语言识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Understanding social media contents has been a primary research topic since the dawn of social networking. Especially, contextual understanding of the noisy text, which is characterized by a high percentage of spelling mistakes with creative spelling, phonetic typing, wordplay, abbreviations, and Meta tags. Thus, the data processing demands a more complex system than traditional natural language processors. Also people easily mixing two or more languages together to express their thoughts in social media context. So automatic language identification at word level become as necessary part for analyzing the noisy content in social media. It would help with the automated analysis of content generated on social media. This study uses Tamil-English code-mixed data from popular social media posts and comments and provided word level language tags using Natural Language Processing (NLP) and modern Machine Learning (ML) technologies. The methodology used for this system is a novel approach implemented as machine learning classifier based on features such as Tamil Unicode characters in Roman scripts, dictionaries, double consonant, and term frequency. Different machine learning classifiers such as Naive Bayes, Logistic Regression, Support Vector Machines (SVM), Decision Trees and Random Forest used in training and testing. Among that the highest accuracy of 89.46% was obtained in SVM classifier.

机译：了解社交媒体内容是社交网络黎明以来的主要研究主题。特别是，对嘈杂的文本的语境理解，其特征在于具有创意拼写，语音键入，WordPlay，缩写和元标记的高比例的拼写错误。因此，数据处理需要比传统的自然语言处理器更复杂的系统。人们也可以轻松将两种或多种语言混合在一起，以在社交媒体背景下表达他们的思想。因此，单词级别的自动语言识别成为在社交媒体中分析嘈杂内容的必要部分。它将有助于自动分析社交媒体上生成的内容。本研究使用来自流行社交媒体帖子和评论的泰米尔英语代码混合数据，并使用自然语言处理（NLP）和现代机器学习（ML）技术提供了单词级语言标签。用于该系统的方法是基于罗马脚本，词典，双辅音和术语频率等泰米尔Unicode字符的功能实现为机器学习分类的新方法。不同的机器学习分类器，如天真贝叶斯，物流回归，支持向量机（SVM），决策树和用于训练和测试的随机林。其中在SVM分类器中获得了89.46％的最高精度。

著录项

来源
《International Conference on Information Technology Research》|2018年|1 v.|共5页
会议地点
作者
Kasthuri Shanmugalingam; Sagara Sumathipala; Chinthaka Premachandra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Social networking (online); Machine learning; Dictionaries; Support vector machines; Natural language processing; Training; Feature extraction;

机译：社交网络（在线）;机器学习;词典;支持向量机;自然语言处理;培训;特征提取;

相似文献

外文文献
中文文献
专利

1. Language identification framework in code-mixed social media text based on quantum LSTM - the word belongs to which language? [J] . Modern Physics Letters, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2020,第6期

机译：基于量子LSTM的代码混合社交媒体文本中语言识别框架 - 这个词属于哪种语言？
2. An effective cybernated word embedding system for analysis and language identification in code-mixed social media text [J] . Shekhar Shashi, Sharma Dilip Kumar, Sufyan Beg M.M. International journal of knowledge-based and intelligent engineering systems . 2019,第3期

机译：一个有效的电子化词嵌入系统，用于在代码混合的社交媒体文本中进行分析和语言识别
3. Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text [J] . International Journal of E-Adoption . 2020,第1期

机译：英文旁遮普语代码混合社交媒体文本的情感分析实验语言识别
4. Word Level Language Identification of Code Mixing Text in Social Media using NLP [C] . Kasthuri Shanmugalingam, Sagara Sumathipala, Chinthaka Premachandra 2018 3rd International Conference on Information Technology Research . 2018

机译：使用NLP的社交媒体中代码混合文本的字级语言识别
5. Intermediate-Level Chinese Language Learners' Social Communication in Chinese on Facebook: A Mixed Methods Study. [D] . Wang, Shenggao. 2013

机译：在Facebook上使用中文进行中级汉语学习者的社交活动：混合方法研究。
6. Text Comprehension and Oral Language as Predictors of Word-Problem Solving: Insights into Word-Problem Solving as a Form of Text Comprehension [O] . Lynn S. Fuchs, Jennifer K. Gilbert, Douglas Fuchs, -1

机译：文本理解和口头语言作为解决单词问题的预测器：洞悉作为文本理解形式的单词问题解决
7. Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text [O] . Das Amitava, Gambäck Björn 2016

机译：在代码混合的印度社交媒体文本中在单词级别识别语言

Word Level Language Identification of Code Mixing Text in Social Media using NLP

摘要

著录项

相似文献

相关主题

期刊订阅