All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media

机译：英语的所有内容可能是印地语：通过自动排名在社交媒体中借用词语借款的自动排名来提高语言识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman's correlation values, our methods perform more than two times better (~ 0 62) in predicting the borrowing likeliness compared to the best performing baseline (~ 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88% of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.

机译：在本文中，我们提出了一组计算方法，以确定基于来自社交媒体的信号来识别被借用的词的可能性。在Spearman的相关价值方面，我们的方法更好地执行超过两倍（〜062），以预测与文献中的最佳表演基线（〜0.26）相比，预测借款似然。基于这种似的估计，我们要求注释者重新注释外国语言主要是本机环境的语言标记。在88％的情况下，注释器认为外语标签应由母语标签替换，从而表明改进自动语言识别系统的巨大范围。

著录项

来源
《Conference on empirical methods in natural language processing》|2017年|lxxiii p. 2243-2979|共11页
会议地点
作者
Jasabanta Patro; Bidisha Samanta; Saurabh Singh; Abhipsa Basu; Prithwish Mukherjee; Monojit Choudhury; Animesh Mukherjee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora [J] . AnupamJamatia, AmitavaDas, Bj?rnGamb?ck Journal of Intelligent Systems . 2019,第3期

机译：英语 - 孟加拉码混合社交媒体集团中深入学习的语言识别
2. Text independent root word identification in Hindi language using natural language processing [J] . Leena Jain, Prateek Agrawal International journal of advanced intelligence paradigms . 2015,第3a4期

机译：使用自然语言处理以印地语进行文本独立的根词识别
3. Word sense-based approach for Hindi to Tamil machine translation using English as pivot language [J] . K. Vimal Kumar, Divakar Yadav International journal of advanced intelligence paradigms . 2018,第3a4期

机译：以英语为中心语言的基于词义的印地语到泰米尔语机器翻译方法
4. All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media [C] . Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Conference on empirical methods in natural language processing . 2017

机译：所有的英语可能都是印地语：通过在社交媒体中自动对单词借用的可能性进行自动排名来增强语言识别
5. A comparison of Spanish and English multimedia shared story interventions on the acquisition of English vocabulary words for English language learners with an intellectual disability. [D] . Rivera, Christopher Juan. 2011

机译：西班牙语和英语多媒体共享故事干预措施的比较，该研究针对智障英语学习者获取英语词汇。
6. Using Nonword Repetition Tasks for the Identification of Language Impairment in Spanish-English Speaking Children: Does the Language of Assessment Matter? [O] . Vera F. Gutiérrez-Clellen, Gabriela Simon-Cereijido -1

机译：使用非词重复任务为语言障碍的西班牙英语母语儿童的鉴别：评估是否物质的语言？
7. All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media [O] . Patro, Jasabanta, Samanta, Bidisha, Singh, Saurabh, 2017

机译：所有英语都可能是印地语：增强语言识别通过自动排列社交媒体中词借的可能性

All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media

摘要

著录项

相似文献

相关主题

期刊订阅