首页> 外文OA文献 >Separate training for conditional random fields using co-occurrence rate factorization
【2h】

Separate training for conditional random fields using co-occurrence rate factorization

机译:使用共现率分解对条件随机场进行单独训练

摘要

Conditional Random Fields (CRFs) are undirected graphical models which are well suited to many natural language processing (NLP) tasks, such part-of-speech (POS) tagging and named entity recognition (NER). The standard training method of CRFs can be very slow for large-scale applications. As an alternative to the standard training method, piecewise training divides the full graph into pieces, trains them independently, and combines the learned weights at test time. But piecewise training does not scale well in the variable cardinality. In this paper we present separate training for undirected models based on the novel Co-occurrence Rate factorization (CR- F). Separate training is a local training method without global propagation. In contrast to directed markov models such as MEMMs, separate training is unaff ected by the label bias problem even it is a local normalized method. We do experiments on two NLP tasks, i.e., POS tagging and NER. Results show that separate training (i) is unaffected by the label bias problem; (ii) reduces the training time from weeks to seconds; and (iii) obtains competitive results to the standard and piecewise training on linear-chain CRFs. Separate training is a promising technique for scaling undirected models for natural language processing tasks. (More details can be found here: http://eprints.eemcs.utwente.nl/22600/)
机译:条件随机字段(CRF)是无向图形模型,非常适合许多自然语言处理(NLP)任务,例如词性(POS)标记和命名实体识别(NER)。对于大型应用,CRF的标准训练方法可能非常慢。作为标准训练方法的替代方法,分段训练将整个图形分为多个部分,分别进行训练,并在测试时组合学习到的权重。但是分段训练在可变基数上不能很好地扩展。在本文中,我们基于新颖的共现率因子分解(CR-F),针对无向模型提供了单独的训练。单独的训练是没有全局传播的局部训练方法。与定向马尔可夫模型(例如MEMM)相比,单独的训练不受标签偏差问题的影响,即使它是局部归一化方法也是如此。我们针对两个NLP任务(即POS标记和NER)进行了实验。结果表明,单独的训练(i)不受标签偏差问题的影响; (ii)将培训时间从几周减少到几秒钟; (iii)在线性链CRF的标准训练和分段训练中获得竞争性结果。单独的训练是一种有前途的技术,可用于扩展自然语言处理任务的无向模型。 (可在此处找到更多详细信息:http://eprints.eemcs.utwente.nl/22600/)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号