An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

Shekhar Shashi; Sharma Dilip Kumar; Sufyan Beg M.M.

首页> 外文期刊>International journal of knowledge-based and intelligent engineering systems >An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

【24h】

An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

机译：一个有效的电子化词嵌入系统，用于在代码混合的社交媒体文本中进行分析和语言识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The language used by the users in social media nowadays is Code-mixed text, i.e., mixing of two or more languages. This paper describes the application of the code mixed index in Indian social media texts and comparing the complexity to identify language at word level using Bi-directional Long Short Term Memory model. Social media platforms are now widely used by people to express their opinion and interest. The major contribution of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. We recommend a deep learning framework based on cBoW and Skip gram model that predicts the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The context capture module of the system gives better accuracy for word embedding model as compared to character embedding.

机译：如今，用户在社交媒体中使用的语言是代码混合文本，即两种或多种语言的混合。本文描述了代码混合索引在印度社交媒体文本中的应用，并使用双向长期短期记忆模型比较了在单词级别识别语言的复杂性。人们现在广泛使用社交媒体平台来表达自己的观点和兴趣。这项工作的主要贡献是提出了一种识别在三种社交媒体平台（Facebook，Twitter和WhatsApp）中使用的印地语-英语代码混合数据的语言的技术。我们建议使用基于cBoW和Skip gram模型的深度学习框架，该框架基于序列中出现的特定单词，从语言角度从序列的角度预测单词的起源。与字符嵌入相比，系统的上下文捕获模块为单词嵌入模型提供了更好的准确性。

著录项

来源
《International journal of knowledge-based and intelligent engineering systems》 |2019年第3期|167-179|共13页
作者
Shekhar Shashi; Sharma Dilip Kumar; Sufyan Beg M.M.;
展开▼
作者单位

Department of Computer Engineering and Applications GLA University;

Department of Computer Engineering Aligarh Muslim University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Language identification; transliteration; character embedding; word embedding; Natural Language Processing; cBoW; skip-gram;

机译：语言识别;音译字符嵌入;词嵌入自然语言处理;波士顿跳过语法;

相似文献

外文文献
中文文献
专利

1. Language identification framework in code-mixed social media text based on quantum LSTM - the word belongs to which language? [J] . Modern Physics Letters, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2020,第6期

机译：基于量子LSTM的代码混合社交媒体文本中语言识别框架 - 这个词属于哪种语言？
2. Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora [J] . AnupamJamatia, AmitavaDas, Bj?rnGamb?ck Journal of Intelligent Systems . 2019,第3期

机译：英语 - 孟加拉码混合社交媒体集团中深入学习的语言识别
3. Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text [J] . International Journal of E-Adoption . 2020,第1期

机译：英文旁遮普语代码混合社交媒体文本的情感分析实验语言识别
4. Language identification at word level in Sinhala-English code-mixed social media text [C] . Kasthuri Shanmugalingam, Sagara Sumathipala International Research Conference on Smart Computing and Systems Engineering . 2019

机译：僧伽罗语-英语代码混合社交媒体文本中单词级别的语言识别
5. Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-Aware Data Curation Process and a Hybrid Approach [D] . Liu, Kun. 2021

机译：利用社交媒体文本融入了词汇语言学分析的失语单词 - OOV感知数据策委和混合方法
6. Model of the Dynamic Construction Process of Texts and Scaling Laws of Words Organization in Language Systems [O] . Shan Li, Ruokuang Lin, Chunhua Bian, 2011

机译：语言系统中文本的动态构建过程和单词组织的缩放规律模型
7. Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text [O] . Das Amitava, Gambäck Björn 2016

机译：在代码混合的印度社交媒体文本中在单词级别识别语言

An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

摘要

著录项

相似文献

相关主题

期刊订阅