Better, Faster, Stronger Sequence Tagging Constituent Parsers

机译：更好，更快，更强的序列标记成分分析器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sequence tagging models for constituent parsing arc faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the spmrl datasets, we observe even greater improvements across the board, including a new state of the art on Basque. Hebrew, Polish and Swedish.

机译：用于组成解析的序列标记模型比其他类型的解析器更快，但准确性较差。在这项工作中，我们解决了此类成分分析器的以下缺点：（a）长成分的括弧周围的高错误率;（b）大标签集，导致稀疏;以及（c）贪婪解码引起的错误传播。为了有效地括弧，我们训练了一个模型，该模型学习了在标记方案之间进行切换。为了减少稀疏性，我们分解了标签集，并使用多任务学习来共同学习预测子标签。最后，我们通过辅助损失和具有策略梯度的语句级微调来缓解贪婪解码带来的问题。结合这些技术，我们明显超过了英语和中文Penn树库上的序列标记组成解析器的性能，并进一步缩短了它们的解析时间。在spmrl数据集上，我们观察到了更大的改进，包括巴斯克（Basque）上的最新技术。希伯来语，波兰语和瑞典语。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|3372-3383|共12页
会议地点
作者
David Vilares; Mostafa Abdou; Anders Sogaard;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Sequence Tagging for Fast Dependency Parsing [J] . Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez Proceedings . 2019,第1期

机译：快速依赖解析的序列标记
2. Faster shift-reduce constituent parsing with a non-binary, bottom-up strategy [J] . Fernandez-Gonzalez Daniel, Gomez-Rodriguez Carlos Artificial intelligence . 2019,第OCTa期

机译：使用非二进制，自底向上的策略更快地减少移位，减少成分解析
3. Faster shift-reduce constituent parsing with a non-binary, bottom-up strategy [J] . Fernandez-Gonzalez Daniel, Gomez-Rodriguez Carlos Artificial intelligence . 2019,第Octa期

机译：更快的转移减少组成部分与非二进制，自下而上策略的解析
4. Better, Faster, Stronger Sequence Tagging Constituent Parsers [C] . David Vilares, Mostafa Abdou, Anders Sogaard Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：更好，更快，更强的序列标记成分解析器
5. Smarter, Better, Faster, Stronger: The Informationalized Infrastructural Ideal. [D] . Oswald, Kathleen Frazer. 2011

机译：更智能，更好，更快，更强大：信息化基础设施理想。
6. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences [O] . Jost Waldmann, Jan Gerken, Wolfgang Hankeln, 2014

机译：FastaValidator：一个开源Java库用于解析和验证FASTA格式的序列
7. Better, Faster, Stronger Sequence Tagging Constituent Parsers [O] . David Vilares, Mostafa Abdou, Anders Søgaard 2019

机译：更好，更快，更强的序列标记成分解析器

Better, Faster, Stronger Sequence Tagging Constituent Parsers

摘要

著录项

相似文献

相关主题

期刊订阅