首页> 外文会议>Conference on empirical methods in natural language processing;Workshop on computational approaches to code switching >Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English Language Data
【24h】

Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English Language Data

机译:斯瓦希里语-英语语言数据中的字级语言识别和代码转换点预测

获取原文

摘要

Codeswitching is a very common behavior among Swahili speakers, but of the little computational work done on Swahili, none has focused on codeswitching. This paper addresses two tasks relating to Swahili-English codeswitching: word-level language identification and prediction of codes witch points. Our two-step model achieves high accuracy at labeling the language of words using a simple feature set combined with label probabilities on the adjacent words. This system is used to label a large Swahili-English internet corpus, which is in turn used to train a model for predicting codeswitch points.
机译:在斯瓦希里语使用者中,代码切换是一种非常普遍的行为,但是在斯瓦希里语上完成的很少的计算工作中,没有人专注于代码切换。本文解决了与斯瓦希里语-英语代码转换有关的两项任务:单词级语言识别和代码巫婆点的预测。我们的两步模型使用简单的功能集结合了相邻单词的标签概率,在标注单词的语言时实现了很高的准确性。该系统用于标记一个大型的斯瓦希里语-英语互联网语料库,该语料库又用于训练一个预测代码转换点的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号