首页> 外文期刊>ACM transactions on Asian language information processing >From Genesis to Creole Language: Transfer Learning for Singlish Universal Dependencies Parsing and POS Tagging
【24h】

From Genesis to Creole Language: Transfer Learning for Singlish Universal Dependencies Parsing and POS Tagging

机译:从创世纪到克里奥尔语:用于单一通用依赖项解析和POS标记的转移学习

获取原文
获取原文并翻译 | 示例

摘要

Singlish can be interesting to the computational linguistics community both linguistically, as a major low-resource creole based on English, and computationally, for information extraction and sentiment analysis of regional social media. In our conference paper, Wang et al. (2017), we investigated part-of-speech (POS) tagging and dependency parsing for Singlish by constructing a treebank under the Universal Dependencies scheme and successfully used neural stacking models to integrate English syntactic knowledge for boosting Singlish POS tagging and dependency parsing, achieving the state-of-the-art accuracies of 89.50% and 84.47% for Singlish POS tagging and dependency, respectively. In this work, we substantially extend Wang et al. (2017) by enlarging the Singlish treebank to more than triple the size and with much more diversity in topics, as well as further exploring neural multi-task models for integrating English syntactic knowledge. Results show that the enlarged treebank has achieved significant relative error reduction of 45.8% and 15.5% on the base model, 27% and 10% on the neural multi-task model, and 21% and 15% on the neural stacking model for POS tagging and dependency parsing, respectively. Moreover, the state-of-the-art Singlish POS tagging and dependency parsing accuracies have been improved to 91.16% and 85.57%, respectively. We make our treebanks and models available for further research.
机译:作为一种主要的基于英语的低资源克里奥尔语,从语言上来说,英语对于计算语言学界来说可能是有趣的,并且在计算上,它可以用于区域社交媒体的信息提取和情感分析。在我们的会议论文中,Wang等人。 (2017),我们通过在Universal Dependencies计划下构建树库,研究了Singlish的词性(POS)标记和依赖项解析,并成功地使用神经堆栈模型集成了英语语法知识以增强Singlish POS标记和依赖项解析,从而实现了Singlish POS标记和依赖项的最新准确性分别为89.50%和84.47%。在这项工作中,我们大大扩展了Wang等。 (2017),将新加坡树库扩大到三倍多,主题多样化,并进一步探索神经多任务模型以整合英语句法知识。结果表明,扩大后的树库在基础模型上的相对误差减少了45.8%和15.5%,在神经多任务模型上的相对误差减少了27%和10%,在用于POS标记的神经堆叠模型上的相对误差减少了21%和15%和依赖项解析。此外,最新的Singlish POS标记和依赖项解析准确性已分别提高到91.16%和85.57%。我们使我们的树库和模型可供进一步研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号