...
首页> 外文期刊>International journal of knowledge-based and intelligent engineering systems >An effective cybernated word embedding system for analysis and language identification in code-mixed social media text
【24h】

An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

机译:一个有效的电子化词嵌入系统,用于在代码混合的社交媒体文本中进行分析和语言识别

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The language used by the users in social media nowadays is Code-mixed text, i.e., mixing of two or more languages. This paper describes the application of the code mixed index in Indian social media texts and comparing the complexity to identify language at word level using Bi-directional Long Short Term Memory model. Social media platforms are now widely used by people to express their opinion and interest. The major contribution of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. We recommend a deep learning framework based on cBoW and Skip gram model that predicts the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The context capture module of the system gives better accuracy for word embedding model as compared to character embedding.
机译:如今,用户在社交媒体中使用的语言是代码混合文本,即两种或多种语言的混合。本文描述了代码混合索引在印度社交媒体文本中的应用,并使用双向长期短期记忆模型比较了在单词级别识别语言的复杂性。人们现在广泛使用社交媒体平台来表达自己的观点和兴趣。这项工作的主要贡献是提出了一种识别在三种社交媒体平台(Facebook,Twitter和WhatsApp)中使用的印地语-英语代码混合数据的语言的技术。我们建议使用基于cBoW和Skip gram模型的深度学习框架,该框架基于序列中出现的特定单词,从语言角度从序列的角度预测单词的起源。与字符嵌入相比,系统的上下文捕获模块为单词嵌入模型提供了更好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号