首页> 外文会议>International workshop on language cognition and computational models >Part-of-Speech Annotation of English-Assamese code-mixed texts: Two Approaches

【24h】

Part-of-Speech Annotation of English-Assamese code-mixed texts: Two Approaches

机译：英语 - assamese代码混合文本的词性注释：两种方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we discuss the development of a part-of-speech tagger for English-Assamese code-mixed texts. We provide a comparison of 2 approaches to annotating code-mixed data a) annotation of the texts from the two languages using monolingual resources from each language and b) annotation of the text through a different resource created specifically for code-mixed data. We present a comparative study of the efforts required in each approach and the final performance of the system. Based on this, we argue that it might be a better approach to develop new technologies using code-mixed data instead of monolingual, 'clean' data, especially for those languages where we do not have significant tools and technologies available till now.

机译：在本文中，我们讨论了用于英语issamese代码混合文本的言语态标记的开发。我们提供了向注释代码混合数据的方法a）使用来自每种语言的单语言资源和b）通过专门为代码混合数据创建的不同资源注释文本的两种语言的文本的注释。我们提出了对每个方法所需的努力和系统的最终表现的比较研究。基于此，我们认为使用代码混合数据而不是单声道，“清洁”数据来开发新技术可能是一种更好的方法，尤其是我们在此语言，我们没有直到现在可以提供重要的工具和技术。

著录项

来源
《International workshop on language cognition and computational models 》|2018年|ix 103 p.|共10页
会议地点
作者
Ritesh Kumar; Manas Jyoti Bora;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程 ;
关键词

相似文献

外文文献
中文文献
专利

1. Distributional Word Representations for Code-mixed Text in Moroccan Darija [J] . Mohamed Aghzal, Asmaa Mourhir Procedia Computer Science . 2021 ,第a期

机译：摩洛哥达里亚的代码混合文本的分布词表示
2. Language identification framework in code-mixed social media text based on quantum LSTM - the word belongs to which language? [J] . Modern Physics Letters, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2020 ,第6期

机译：基于量子LSTM的代码混合社交媒体文本中语言识别框架 - 这个词属于哪种语言？
3. Detection of Hate Speech Text in Hindi-English Code-mixed Data [J] . K Sreelakshmi, B Premjith, K.P. Soman Procedia Computer Science . 2020 ,第5期

机译：印度英语代码混合数据中仇恨语音文本的检测
4. Part-of-Speech Annotation of English-Assamese code-mixed texts: Two Approaches [C] . Ritesh Kumar, Manas Jyoti Bora First international workshop on language cognition and computational models . 2018

机译：英语-阿萨姆语代码混合文本的词性注释：两种方法
5. Evaluating the effects of noninteractive and machine-assisted interactive manual clinical text annotation approaches on the quality of reference standards. [D] . South, Brett Ray. 2014

机译：评估非交互式和机器辅助交互式手册临床文本注释方法对参考标准质量的影响。
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Part-of-speech Tagging of Code-Mixed Social Media Text [O] . Souvick Ghosh, Satanu Ghosh, Dipankar Das 2016

机译：代码混合的社交媒体文本的词性标记
8. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments [R] . Gimpel, K., Schneider, N., O'Connor, B., 2010

机译：Twitter的词性标注：注释，功能和实验

Part-of-Speech Annotation of English-Assamese code-mixed texts: Two Approaches

摘要

著录项

相似文献

相关主题

期刊订阅