首页> 外文会议>Conference on empirical methods in natural language processing >Nightmare at test time: How punctuation prevents parsers from generalizing
【24h】

Nightmare at test time: How punctuation prevents parsers from generalizing

机译:在测试时间的噩梦:标点符号如何防止解析器概括

获取原文

摘要

Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also show effects on out-of-domain data.
机译:标点符号是一个强大的句法结构指示器,并且在具有标点符号的文本上培训的解析器通常依赖于此信号。标点符号是转移,然而,由于人类语言处理不依赖于标点符号,并且在非正式文本中,因此我们经常遗漏标点符号。我们还通过错误地使用标点符号,以便于强调或创造性的目的,或者根本错误地使用标点。我们表明(a)依赖解析器对不存在标点符号并替代使用敏感; (b)神经解毒剂往往比复古解毒剂更敏感; (c)培训没有标点符号的神经解析器优于所有场景的所有外箱解析器,标点符号从标准标点出发。我们的主要实验在综合损坏的数据上,以研究标点符号的效果,避免潜在的混淆,但我们还显示对域外数据的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号