...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Graph-Based Lexicon Regularization for PCFG With Latent Annotations
【24h】

Graph-Based Lexicon Regularization for PCFG With Latent Annotations

机译:具有潜在注释的PCFG的基于图的词典正则化

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper aims at learning a better probabilistic context-free grammar with latent annotations (PCFG-LA) by using a graph propagation (GP) technique. We propose leveraging the GP to regularize the lexical model of the grammar. The proposed approach constructs -nearest neighbor (-NN) similarity graphs over words with identical pre-terminal (part-of-speech) tags, for propagating the probabilities of latent annotations given the words. The graphs demonstrate the relationship between words in syntactic and semantic levels, estimated by using a neural word representation method based on Recursive autoencoder (RAE). We modify the conventional PCFG-LA parameter estimation algorithm, expectation maximization (EM), by incorporating a GP process subsequent to the M-step. The GP encourages the smoothness among the graph vertices, where different words under similar syntactic and semantic environments should have approximate posterior distributions of nonterminal subcategories. The proposed PCFG-LA learning approach was evaluated together with a hierarchical split-and-merge training strategy, on parsing tasks for English, Chinese and Portuguese. The empirical results reveal two crucial findings: 1) regularizing the lexicons with GP results in positive effects to parsing accuracy; and 2) learning with unlabeled data can also expand the PCFG-LA lexicons.
机译:本文旨在通过使用图传播(GP)技术来学习一种更好的带有潜在注释(PCFG-LA)的概率随机上下文。我们建议利用GP来规范语法的词汇模型。所提出的方法在具有相同的前置词(词性)标签的词上构造了-最近邻(-NN)相似图,用于传播给定词的潜在注释的概率。这些图通过使用基于递归自动编码器(RAE)的神经词表示方法来估计单词在句法和语义级别之间的关系。通过在M步骤之后合并GP过程,我们修改了常规PCFG-LA参数估计算法,即期望最大化(EM)。 GP鼓励图顶点之间的平滑度,其中在相似的句法和语义环境下,不同的单词应具有非终结子类别的近似后验分布。评估了拟议的PCFG-LA学习方法,以及针对英语,汉语和葡萄牙语的解析任务的分层拆分合并培训策略。经验结果揭示了两个关键发现:1)用GP规范词典对解析精度产生积极影响;和2)学习未标记的数据也可以扩展PCFG-LA词典。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号