【24h】

Better, Faster, Stronger Sequence Tagging Constituent Parsers

机译:更好,更快,更强的序列标记成分分析器

获取原文

摘要

Sequence tagging models for constituent parsing arc faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the spmrl datasets, we observe even greater improvements across the board, including a new state of the art on Basque. Hebrew, Polish and Swedish.
机译:用于组成解析的序列标记模型比其他类型的解析器更快,但准确性较差。在这项工作中,我们解决了此类成分分析器的以下缺点:(a)长成分的括弧周围的高错误率;(b)大标签集,导致稀疏;以及(c)贪婪解码引起的错误传播。为了有效地括弧,我们训练了一个模型,该模型学习了在标记方案之间进行切换。为了减少稀疏性,我们分解了标签集,并使用多任务学习来共同学习预测子标签。最后,我们通过辅助损失和具有策略梯度的语句级微调来缓解贪婪解码带来的问题。结合这些技术,我们明显超过了英语和中文Penn树库上的序列标记组成解析器的性能,并进一步缩短了它们的解析时间。在spmrl数据集上,我们观察到了更大的改进,包括巴斯克(Basque)上的最新技术。希伯来语,波兰语和瑞典语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号