首页> 外文会议>Conference on computational natural language learning >Punctuation: Making a Point in Unsupervised Dependency Parsing
【24h】

Punctuation: Making a Point in Unsupervised Dependency Parsing

机译:标点符号:在无监督的依赖解析中达成一点

获取原文

摘要

We show how punctuation can be used to improve unsupervised dependency parsing. Our linguistic analysis confirms the strong connection between English punctuation and phrase boundaries in the Penn Treebank. However, approaches that naively include punctuation marks in the grammar (as if they were words) do not perform well with Klein and Manning's Dependency Model with Valence (DMV). Instead, we split a sentence at punctuation and impose parsing restrictions over its fragments. Our grammar inducer is trained on the Wall Street Journal (WSJ) and achieves 59.5% accuracy out-of-domain (Brown sentences with 100 or fewer words), more than 6% higher than the previous best results. Further evaluation, using the 2006/7 CoNLL sets, reveals that punctuation aids grammar induction in 17 of 18 languages, for an overall average net gain of 1.3%. Some of this improvement is from training, but more than half is from parsing with induced constraints, in inference. Punctuation-aware decoding works with existing (even already-trained) parsing models and always increased accuracy in our experiments.
机译:我们展示了标点符号如何用于改善无监督的依赖解析。我们的语言分析证实了英语标点符号与Penn TreeBank中英语标点符号与短语的强烈连接。然而,胆怯地包括语法中的标点符号(仿佛是单词)的方法,与价值(DMV)的韦林和曼宁的依赖模型不相符。相反,我们在标点符号处拆分一个句子,并施加对其碎片的解析限制。我们的语法诱导者在华尔街日报(WSJ)培训,达到59.5%的域的精度(棕色句子100或更少的棕色句子),比以前的最佳结果高出6%以上。进一步评估,使用2006/7 Conll集,揭示了标点符号艾滋病中的18种语言的语法诱导,总体平均净增益为1.3%。其中一些改进来自训练,但超过一半是从引起的引起的引起的引起的解析。标点符识告知解码与现有(甚至已经训练的)解析模型合作,并且始终提高了我们的实验中的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号