首页> 外文会议>Workshop on Asian Language Resources >Feasibility of Leveraging Crowd Sourcing for the Creation of a Large Scale Annotated Resource for Hindi English Code Switched Data: A Pilot Annotation
【24h】

Feasibility of Leveraging Crowd Sourcing for the Creation of a Large Scale Annotated Resource for Hindi English Code Switched Data: A Pilot Annotation

机译:利用人群采购的可行性为创建印地语英语代码交换数据的大规模注释资源:试点注释

获取原文
获取外文期刊封面目录资料

摘要

Linguistic code switching (LCS) occurs when speakers mix multiple languages in the same speech utterance. We find LCS pervasively in bilingual communities. LCS poses a serious challenge to Natural Language and Speech Processing. With the ubiquity of informal genres online, LCS is emerging as a very widespread phenomenon. This paper presents a first attempt at collecting and annotating a large repository of LCS data. We target Hindi English (Hinglish) LCS. We investigate the feasibility of leveraging crowd sourcing as a means for annotating the data on the word level. This paper briefly explains the setup of the experiment and data collection. It also presents statistics representing agreements among annotators over different possible categories of Hinglish words and analyzes the confidence with which a code switched word can be annotated in the correct category by humans.
机译:当扬声器在同一语音发声中混合多种语言时,会发生语言代码切换(LCS)。我们在双语社区普遍发现LCS。 LCS对自然语言和语音处理构成了严峻挑战。随着在线非正式类型的无处不在,LCS被涌现为一个非常广泛的现象。本文提出了第一次尝试收集和注释LCS数据的大型存储库。我们针对印地语英语(HINGLISH)LCS。我们调查利用人群采购的可行性作为注释单词级别数据的手段。本文简要说明了实验和数据收集的设置。它还提出了在不同可能类别的HINGISH单词中代表注释器之间的协议的统计数据,并分析了人类在正确的类别中可以注释的代码切换字的置信度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号