首页> 外文期刊>Information Processing & Management >Influence of social conversational features on language identification in highly multilingual online conversations
【24h】

Influence of social conversational features on language identification in highly multilingual online conversations

机译:社交对话功能对高度多语言在线对话中语言识别的影响

获取原文
获取原文并翻译 | 示例

摘要

With the explosion of multilingual content on Web, particularly in social media platforms, identification of languages present in the text is becoming an important task for various applications. While automatic language identification (ALI) in social media text is considered to be a non-trivial task due to the presence of slang words, misspellings, creative spellings and special elements such as hashtags, user mentions etc., ALI in multilingual environment becomes even more challenging task. In a highly multilingual society, code-mixing without affecting the underlying language sense has become a natural phenomenon. In such a dynamic environment, conversational text alone often fails to identify the underlying languages present in the text. This paper proposes various methods of exploiting social conversational features for enhancing ALI performance. Although social conversational features for ALI have been explored previously using methods like probabilistic language modeling, these models often fail to address issues related to code-mixing, phonetic typing, out-of-vocabulary etc. which are prevalent in a highly multilingual environment. This paper differs in the way the social conversational features are used to propose text refinement strategies that are suitable for ALI in highly multilingual environment. The contributions in this paper therefore includes the following. First, this paper analyzes the characteristics of various social conversational features by exploiting language usage patterns. Second, various methods of text refinement suitable for language identification are proposed. Third, the effects of the proposed refinement methods are investigated using varioussentence levellanguage identification frameworks. From various experimental observations over three conversational datasets collected from Facebook, Youtube and Twitter social media platforms, it is evident that our proposed method of ALI using social conversational features outperforms the baseline counterparts.
机译:随着Web上多语言内容的爆炸式增长,尤其是在社交媒体平台中,文本中语言的识别正成为各种应用程序的重要任务。由于存在语,拼写错误,创造性拼写以及诸如标签,用户提及等特殊元素的存在,社交媒体文本中的自动语言识别(ALI)被认为是一项艰巨的任务,而在多语言环境中,ALI变得更加平凡更具挑战性的任务。在高度多语言的社会中,不影响基本语言含义的代码混合已成为自然现象。在这种动态环境中,仅对话文本通常无法识别文本中存在的基础语言。本文提出了多种利用社交对话功能来增强ALI性能的方法。尽管以前已经使用概率语言建模等方法探索了ALI的社交对话功能,但是这些模型通常无法解决与代码混合,语音键入,语音不足等有关的问题,这些问题在高度多语言的环境中非常普遍。本文在社交对话功能用于提出适合于高度多语言环境中的ALI的文本提炼策略的方式方面有所不同。因此,本文的贡献包括以下内容。首先,本文通过利用语言使用模式来分析各种社交对话特征的特征。其次,提出了适合于语言识别的各种文本细化方法。第三,使用各种句级语言识别框架来研究所提出的改进方法的效果。从对从Facebook,Youtube和Twitter社交媒体平台收集的三个对话数据集进行的各种实验观察中,很明显,我们提出的使用社交对话功能的ALI方法胜过了基线对话方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号