【24h】

Tagging a Norwegian Dialect Corpus

机译:标记挪威方言语料库

获取原文

摘要

This paper describes an evaluation of five data-driven Part-of-Speech (PoS) taggers for spoken Norwegian. The taggers all rely on different machine learning mechanisms: decision trees, hidden Markov models (HMMs), conditional random fields (CRFs), long-short term memory networks (LSTMs), and convolutional neural networks (CNNs). We go into some of the challenges posed by the task of tagging spoken, as opposed to written, language, and in particular a wide range of dialects as is found in the recordings of the LIA (Language Infrastructure made Accessible) project. The results show that the taggers based on either conditional random fields or neural networks perform much better than the rest, with the LSTM tagger getting the highest score.
机译:本文介绍了针对挪威语的五个数据驱动的词性(PoS)标记器的评估。标记者都依赖于不同的机器学习机制:决策树,隐马尔可夫模型(HMM),条件随机字段(CRF),长期短期记忆网络(LSTM)和卷积神经网络(CNN)。我们将讨论口语(而不是书面),语言(尤其是各种方言)的标记任务所带来的一些挑战,正如LIA(可访问语言基础结构)项目的录音中所发现的那样。结果表明,基于条件随机场或神经网络的标记器的性能要优于其余的,其中LSTM标记器的得分最高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号