首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Multilingual part-of-speech tagging with weightless neural networks
【24h】

Multilingual part-of-speech tagging with weightless neural networks

机译:失重神经网络的多语言词性标注

获取原文
获取原文并翻译 | 示例
           

摘要

Training part-of-speech taggers (POS-taggers) requires iterative time-consuming convergence-dependable steps, which involve either expectation maximization or weight balancing processes, depending on whether the tagger uses stochastic or neural approaches, respectively. Due to the complexity of these steps, multilingual part-of-speech tagging can be an intractable task, where as the number of languages increases so does the time demanded by these steps. WiSARD (Wilkie, Stonham and Aleksander's Recognition Device), a weightless artificial neural network architecture that proved to be both robust and efficient in classification tasks, has been previously used in order to turn the training phase faster. WiSARD is a RAM-based system that requires only one memory writing operation to train each sentence. Additionally, the mechanism is capable of learning new tagged sentences during the classification phase, on an incremental basis. Nevertheless, parameters such as RAM size, context window, and probability bit mapping, make the multilingual part-of-speech tagging task hard. This article proposes mWANN-Tagger (multilingual Weightless Artificial Neural Network tagger), a WiSARD PUS-tagger. This tagger is proposed due to its one-pass learning capability. It allows language-specific parameter configurations to be thoroughly searched in quite an agile fashion. Experimental evaluation indicates that mWANN-Tagger either outperforms or matches state-of-art methods in accuracy with very low standard deviation, i.e., lower than 0.25%. Experimental results also suggest that the vast majority of the languages can benefit from this architecture. (C) 2015 Elsevier Ltd. All rights reserved.
机译:训练词性标记器(POS-taggers)需要迭代耗时的收敛相关步骤,这些步骤涉及期望最大化或权重平衡过程,这取决于标记器分别使用随机方法还是神经方法。由于这些步骤的复杂性,多语言词性标记可能是一项棘手的任务,随着语言数量的增加,这些步骤所需的时间也会增加。 WiSARD(Wilkie,Stonham和Aleksander的Recognition Device)是一种轻便的人工神经网络架构,在分类任务中被证明既健壮又高效,先前已被使用来加快训练阶段。 WiSARD是基于RAM的系统,只需要一个存储器写操作即可训练每个句子。另外,该机制能够在分类阶段以增量方式学习新的带标记的句子。尽管如此,诸如RAM大小,上下文窗口和概率位映射之类的参数使多语言词性标记任务变得困难。本文提出了一种WiSARD PUS-tagger mWANN-Tagger(多语言失重人工神经网络标记器)。提出该标记器是由于其具有一次通过的学习能力。它允许以敏捷的方式彻底搜索特定于语言的参数配置。实验评估表明,mWANN-Tagger的准确度优于或匹配最新技术,且标准偏差非常低,即低于0.25%。实验结果还表明,绝大多数语言都可以从该体系结构中受益。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号