Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification

机译：用于代码切换的代码的词性标记，无需显式语言识别的转换文本

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Code-switching, the use of more than one language within a single utterance, is ubiquitous in much of the world, but remains a challenge for NLP largely due to the lack of representative data for training models. In this paper, we present a novel model architecture that is trained exclusively on monolingual resources, but can be applied to unseen code-switched text at inference time. The model accomplishes this by jointly maintaining separate word representations for each of the possible languages-or scripts in the case of transliteration-allowing each to contribute to inferences without forcing the model to commit to a language. Experiments on Hindi-English part-of-speech tagging demonstrate that our approach outperforms standard models when training on monolingual text without transliteration, and testing on code-switched text with alternate scripts.

机译：代码切换，在单个话语中使用多种语言，在世界大部分地区都无处不在，但由于缺乏培训模型的代表性数据仍然是NLP的挑战。在本文中，我们提出了一种专门培训的新型模型架构，可以在单晶体资源上培训，但可以在推理时间内应用于未经说明的代码切换文本。该模型通过在音译的情况下共同维护每个可能语言的单独字表示来实现这一点 - 允许每个可以有助于推断而不强制模型来提交语言。印度英语分配的标记的实验表明，我们的方法在没有音译的单格文本训练时占据标准模型，以及使用备用脚本的代码切换文本测试。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|cxvi p. 2890-3611|共6页
会议地点
作者
Kelsey Ball; Dan Garrette;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Novel Text Steganography Using Natural Language Processing and Part-of-Speech Tagging [J] . Banik Barnali Gupta, Bandyopadhyay Samir Kumar IETE Journal of Research . 2020,第3期

机译：使用自然语言处理和致辞标记的新颖文本隐写
2. Exploiting languages proximity for part-of-speech tagging of three French regional languages [J] . Magistry Pierre, Ligozat Anne-Laure, Rosset Sophie Language Resources and Evaluation . 2019,第4期

机译：利用语言邻近性对三种法语区域语言进行词性标记
3. An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification [J] . Yeh Ching-Feng, Lee Lin-Shan Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第7期

机译：跨语言声学建模和框架级语言识别的高度识别双语代码转换演讲的改进框架
4. Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification [C] . Kelsey Ball, Dan Garrette Conference on empirical methods in natural language processing . 2018

机译：无需显式语言识别的代码转换音译文本的词性标记
5. IITagger: Tagging Wall Street Journal text with part-of-speech information [D] . Kim, Yeongkwun 1996

机译：IITagger：使用词性信息标记“华尔街日报”文本
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Part-of-speech tagging for english-spanish code-switched text [O] . Thamar Solorio, Yang Liu 2008

机译：英语 - 西班牙语代码转换文本的词性标注
8. Low-Resource Speech Translation of Urdu to English Using Semi- Supervised Part-of-Speech Tagging and Transliteration [R] . Aminzadeh, A. R., Shen, W. 2008

机译：利用半监督词性标注和音译将乌尔都语低资源语音翻译成英语

Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅