首页> 外文会议>Conference on empirical methods in natural language processing >All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media
【24h】

All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media

机译:英语的所有内容可能是印地语:通过自动排名在社交媒体中借用词语借款的自动排名来提高语言识别

获取原文

摘要

In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman's correlation values, our methods perform more than two times better (~ 0 62) in predicting the borrowing likeliness compared to the best performing baseline (~ 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88% of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.
机译:在本文中,我们提出了一组计算方法,以确定基于来自社交媒体的信号来识别被借用的词的可能性。在Spearman的相关价值方面,我们的方法更好地执行超过两倍(〜062),以预测与文献中的最佳表演基线(〜0.26)相比,预测借款似然。基于这种似的估计,我们要求注释者重新注释外国语言主要是本机环境的语言标记。在88%的情况下,注释器认为外语标签应由母语标签替换,从而表明改进自动语言识别系统的巨大范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号