An Automatic Language Identification System for Code-Mixed English-Kannada Social Media Text

机译：混合代码的英语-卡纳达语社交媒体文本的自动语言识别系统

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The task of identifying the language of a document or word automatically is known as Language Identification (LID). With the increase in popularity of social media and smart devices, a huge number of people have come online. Majority of the user-generated data on web are code-mixed or multi-script form, where the words are represented in a non-native script. In this work, we focused on the problem of word-level LID for code-mixed data. Dataset collected contains English and Kannada code mixed sentences from social media posts. Experiments on various supervised classifiers are performed by embedding a dictionary module to handle word level code mixing.

机译：自动识别文档或单词的语言的任务称为语言识别（LID）。随着社交媒体和智能设备的普及，大量人上网。网络上大多数用户生成的数据是代码混合或多脚本形式，其中单词以非本地脚本表示。在这项工作中，我们专注于代码混合数据的字级LID问题。收集的数据集包含社交媒体帖子中的英语和卡纳达语代码混合句子。通过嵌入字典模块来处理单词级代码混合，可以对各种监督分类器进行实验。

著录项

来源
《2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution》|2017年|1-5|共5页
会议地点 Bangalore(IN)
作者
B S Sowmya Lakshmi; B R Shambhavi;
展开▼
作者单位

BMS College of Engineering, Department of ISE, Bangalore, Karnataka, India;

BMS College of Engineering, Department of ISE, Bangalore, Karnataka, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature extraction; Dictionaries; Hidden Markov models; Classification algorithms; Logistics; Facebook;

机译：特征提取词典隐马尔可夫模型分类算法物流Logistic Facebook;
入库时间 2022-08-26 13:58:48

相似文献

外文文献
中文文献
专利

1. An effective cybernated word embedding system for analysis and language identification in code-mixed social media text [J] . Shekhar Shashi, Sharma Dilip Kumar, Sufyan Beg M.M. International journal of knowledge-based and intelligent engineering systems . 2019,第3期

机译：一个有效的电子化词嵌入系统，用于在代码混合的社交媒体文本中进行分析和语言识别
2. Language identification framework in code-mixed social media text based on quantum LSTM - the word belongs to which language? [J] . Modern Physics Letters, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2020,第6期

机译：基于量子LSTM的代码混合社交媒体文本中语言识别框架 - 这个词属于哪种语言？
3. Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora [J] . AnupamJamatia, AmitavaDas, Bj?rnGamb?ck Journal of Intelligent Systems . 2019,第3期

机译：英语 - 孟加拉码混合社交媒体集团中深入学习的语言识别
4. An Automatic Language Identification System for Code-Mixed English-Kannada Social Media Text [C] . B S Sowmya Lakshmi, B R Shambhavi International Conference on Computation Systems and Information Technology for Sustainable Solutions . 2017

机译：用于Code-Micric英语 - kannada社交媒体文本的自动语言识别系统
5. Identification of concepts from emergency department text using natural language processing techniques and the Unified Medical Language System RTM. [D] . Travers, Debbie. 2003

机译：使用自然语言处理技术和Unified Medical Language System RTM从急诊科文本中识别概念。
6. Text classification models for the automatic detection of nonmedical prescription medication use from social media [O] . Mohammed Ali Al-Garadi, Yuan-Chi Yang, Haitao Cai, 2021

机译：文本分类模型用于自动检测社交媒体非医疗处方药物
7. Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text [O] . Das Amitava, Gambäck Björn 2016

机译：在代码混合的印度社交媒体文本中在单词级别识别语言
8. Development and Evaluation of Self-Instructional Texts and an Operational Specification for Computer Directed Training in Intermediate Query Language, Model 11, for System 473L, United States Air Force Headquarters. [R] . Slough, D. C., Yens, D. P., Northrup, J. L., 1966

机译：针对美国空军总部473L系统的模型11的中级查询语言计算机导向培训的自学教材和操作规范的开发和评估。

An Automatic Language Identification System for Code-Mixed English-Kannada Social Media Text

摘要

著录项

相似文献

相关主题

期刊订阅