This paper presents a novel method using graph-based semi-supervised learning (SSL) to improve the syntax parsing of unknown words. Different from conventional approaches that uses hand-crafted rules, rich morphological features, or a character-based model to handle unknown words, this method is based on a graph-based label propagation technique. It gives greater improvement on grammars trained on a smaller amount of labeled data and a large amount of unlabeled one. A transductiv graph-based SSL method is employed to propagate POS and derive the emission distributions from labeled data to unlabeled one. The derived distributions are incorporated into the parsing process. The proposed method effectively augments the original supervised parsing model by contributing 2.28% and 1.72% absolute improvement on the accuracy of POS tagging and syntax parsing for Penn Chinese Treebank respectively.
展开▼
机译:本文提出了一种基于图形的半监督学习(SSL)改进未知词语法解析的新方法。与使用手工规则,丰富的形态特征或基于字符的模型来处理未知单词的常规方法不同,此方法基于基于图的标签传播技术。它对使用较少数量的标记数据和大量未标记的数据训练的语法提供了更大的改进。采用基于导图的SSL方法传播POS,并从标记数据到未标记数据推导发射分布。派生的分布被合并到解析过程中。所提出的方法通过分别对Penn Chinese Treebank的POS标记和语法解析的准确性分别贡献了2.28%和1.72%的绝对改进,有效地增强了原始的监督解析模型。
展开▼