首页> 外文会议>IEEE International Conference on Big Data >Anuj@IEEE BigData 2019: A Novel Code-Switching Behavior Analysis in Social Media Discussions Natural Language Processing
【24h】

Anuj@IEEE BigData 2019: A Novel Code-Switching Behavior Analysis in Social Media Discussions Natural Language Processing

机译:Anuj @ IEEE BigData 2019:社交媒体讨论中的新型代码转换行为分析自然语言处理

获取原文

摘要

With internet and social media breaking the ice, more and more people across the globe have started to use social media platforms like Facebook, Twitter, Instagram etc. Most people follow Multilingualism as a mode of communication to convey information across the globe. They share topics over the common forum to converse, with the use of multiple languages being spoken either by individual speaker or group of speakers. This essentially makes the context more complex to understand and it makes even more harder for processing various Natural Language Processing (NLP) tasks. Such user behavior of mixing multiple languages in one single discussion topic, having multiple community inclusion is referred as code-switching. At IEEE 2019 Big data conference, a Shared Task (Understanding Multilingual Communities through Analysis of code-switching Behaviors in Social Media Discussions) is conducted as a track of Big Data Cup. Firstly, Tasks is to detect the language of each post given in the discussion forum with the help of multiple languages. Secondly, to detect relevance score of a post by determining how much the content is closely connected or appropriate in the discussion. This paper proposes a novel approach to detect the language of each word in the post using Natural Language Processing (NLP) techniques involving linguistics, Python package(langdetect) and various other approaches. It also explains how Machine Learning is applied to figure out relevance of a post and other metrics required for prediction. Code-Mixing detection is an important step for any NLP application to determine the language of a post at first place in order to perform any NLP task over social media.
机译:随着互联网和社交媒体破冰而出,全球越来越多的人开始使用Facebook,Twitter,Instagram等社交媒体平台。大多数人都将多种语言作为一种传播方式,在全球范围内传播信息。他们在共同的论坛上共享话题以进行交谈,并且使用由单个发言人或一组发言人说的多种语言。这从本质上使上下文更难以理解,并且使处理各种自然语言处理(NLP)任务变得更加困难。这种在一个讨论主题中混合使用多种语言,包含多个社区的用户行为被称为代码切换。在IEEE 2019大数据会议上,作为大数据杯的赛道,开展了一项共享任务(通过分析社交媒体讨论中的代码交换行为来了解多语言社区)。首先,任务是在多种语言的帮助下检测讨论论坛中给出的每个帖子的语言。其次,通过确定讨论中内容紧密相关或合适的程度来检测帖子的相关性得分。本文提出了一种使用自然语言处理(NLP)技术检测帖子中每个单词的语言的新颖方法,该技术涉及语言学,Python软件包(langdetect)和其他各种方法。它还说明了如何将机器学习应用于找出帖子的相关性以及预测所需的其他指标。对于任何NLP应用程序来说,代码混合检测是重要的步骤,它首先要确定帖子的语言,以便通过社交媒体执行任何NLP任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号