首页> 外文会议>Workshop on semantic deep learning >Bridging the Gap: Improve Part-of-speech Tagging for Chinese Social Media Texts with Foreign Words
【24h】

Bridging the Gap: Improve Part-of-speech Tagging for Chinese Social Media Texts with Foreign Words

机译:弥合差距:改进带有外词的中文社交媒体文本的词性标注

获取原文

摘要

Multilingual speakers often switch between languages and generate enormous quantities of cross-language data. This phenomenon is more frequent observed in social media texts, where a large body of user generated data is produced every day.' Such mix-lingual and informal texts lead to a challenge for part-of-speech (POS) tagging, which is one fundamental task in natural language processing. In this paper, we propose a language-agnostic POS tagger for social media texts, which is able to learn from heterogeneous data with different genre and language type. Particularly, in order to comprehensively evaluate POS tagging performance, we propose a new tagging scheme including exclusive tags for special symbols in social media texts, and a human-annotated dataset of Chinese-English mixed social media texts is also developed. Experiments on both synthetic and real datasets show the validity and effectiveness of our model on social media texts where it outperforms state-of-the-art language-specific taggers.
机译:会说多种语言的人经常在多种语言之间切换,并生成大量的跨语言数据。这种现象在社交媒体文本中更常见,每天都会产生大量用户生成的数据。”这样的混合语言和非正式文本给词性(POS)标记带来了挑战,这是自然语言处理中的一项基本任务。在本文中,我们为社交媒体文本提出了一种与语言无关的POS标记器,该标记器能够从具有不同体裁和语言类型的异构数据中学习。特别是,为了全面评估POS标记性能,我们提出了一种新的标记方案,其中包括社交媒体文本中特殊符号的专用标记,并且还开发了人工注释的汉英混合社交媒体文本数据集。在合成数据集和真实数据集上进行的实验表明,我们的模型在社交媒体文本上的有效性和有效性均优于最新的特定语言标记器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号