首页> 外文会议>Conference on empirical methods in natural language processing >Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification
【24h】

Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification

机译:用于代码切换的代码的词性标记,无需显式语言识别的转换文本

获取原文

摘要

Code-switching, the use of more than one language within a single utterance, is ubiquitous in much of the world, but remains a challenge for NLP largely due to the lack of representative data for training models. In this paper, we present a novel model architecture that is trained exclusively on monolingual resources, but can be applied to unseen code-switched text at inference time. The model accomplishes this by jointly maintaining separate word representations for each of the possible languages-or scripts in the case of transliteration-allowing each to contribute to inferences without forcing the model to commit to a language. Experiments on Hindi-English part-of-speech tagging demonstrate that our approach outperforms standard models when training on monolingual text without transliteration, and testing on code-switched text with alternate scripts.
机译:代码切换,在单个话语中使用多种语言,在世界大部分地区都无处不在,但由于缺乏培训模型的代表性数据仍然是NLP的挑战。在本文中,我们提出了一种专门培训的新型模型架构,可以在单晶体资源上培训,但可以在推理时间内应用于未经说明的代码切换文本。该模型通过在音译的情况下共同维护每个可能语言的单独字表示来实现这一点 - 允许每个可以有助于推断而不强制模型来提交语言。印度英语分配的标记的实验表明,我们的方法在没有音译的单格文本训练时占据标准模型,以及使用备用脚本的代码切换文本测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号