首页> 外文会议>2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution >An Automatic Language Identification System for Code-Mixed English-Kannada Social Media Text
【24h】

An Automatic Language Identification System for Code-Mixed English-Kannada Social Media Text

机译:混合代码的英语-卡纳达语社交媒体文本的自动语言识别系统

获取原文
获取原文并翻译 | 示例

摘要

The task of identifying the language of a document or word automatically is known as Language Identification (LID). With the increase in popularity of social media and smart devices, a huge number of people have come online. Majority of the user-generated data on web are code-mixed or multi-script form, where the words are represented in a non-native script. In this work, we focused on the problem of word-level LID for code-mixed data. Dataset collected contains English and Kannada code mixed sentences from social media posts. Experiments on various supervised classifiers are performed by embedding a dictionary module to handle word level code mixing.
机译:自动识别文档或单词的语言的任务称为语言识别(LID)。随着社交媒体和智能设备的普及,大量人上网。网络上大多数用户生成的数据是代码混合或多脚本形式,其中单词以非本地脚本表示。在这项工作中,我们专注于代码混合数据的字级LID问题。收集的数据集包含社交媒体帖子中的英语和卡纳达语代码混合句子。通过嵌入字典模块来处理单词级代码混合,可以对各种监督分类器进行实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号