首页> 外文期刊>Journal of logic and computation >A word clustering approach to domain adaptation: Robust parsing of source and target domains
【24h】

A word clustering approach to domain adaptation: Robust parsing of source and target domains

机译:用于领域适应的词聚类方法:源域和目标域的强大解析

获取原文
获取原文并翻译 | 示例
       

摘要

We present a technique to improve out-of-domain statistical parsing by reducing lexical data sparseness in a PCFG-LA architecture. We replace terminal symbols with unsupervised word clusters acquired from a large newspaper corpus augmented with target domain data. We also investigate the impact of guiding out-of-domain parsing with predicted part-of-speech tags. We provide an evaluation for French, and obtain improvements in performance for both non-technical and technical target domains. Though the improvements over a strong baseline are slight, an interesting result is that the proposed techniques also improve parsing performance on the source domain, contrary to techniques such as self-training, thus leading to a more robust parser overall. We also describe new target domain evaluation treebanks, freely available, that comprise a total of about 3,000 annotated sentences from the medical domain, regional newspaper articles, French Europarl and French Wikipedia.
机译:我们提出了一种通过减少PCFG-LA体系结构中的词法数据稀疏性来改善域外统计分析的技术。我们用从带有目标域数据的大型报纸语料库获取的无监督词簇替换终端符号。我们还研究了使用预测的词性标签指导域外解析的影响。我们为法语提供评估,并获得非技术和技术目标领域的性能提升。尽管在强基准上的改进很小,但有趣的结果是,与自训练等技术相反,所提出的技术还提高了源域上的解析性能,从而使解析器总体上更强大。我们还将描述新的目标领域评估树库,这些树库可免费获得,其中包括来自医学领域,地区报纸文章,法语Europarl和法语维基百科的总共约3,000条带注释的句子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号