首页> 外文会议>9th International conference on language resources and evaluation >Extrinsic Corpus Evaluation with a Collocation Dictionary Task
【24h】

Extrinsic Corpus Evaluation with a Collocation Dictionary Task

机译:外在语料库评估与搭配词典任务

获取原文

摘要

The NLP researcher or application-builder often wonders "what corpus should I use, or should I build one of my own? If I build one of my own, how will I know if I have done a good job?" Currently there is very little help available for them. They are in need of a framework for evaluating corpora. We develop such a framework, in relation to corpora which aim for good coverage of 'general language'. The task we set is automatic creation of a publication-quality collocations dictionary. For a sample of 100 headwords of Czech and 100 of English, we identify a gold standard dataset of (ideally) all the collocations that should appear for these headwords in such a dictionary. The datasets are being made available alongside this paper. We then use them to determine precision and recall for a range of corpora, with a range of parameters.
机译:NLP研究人员或应用程序构建人员经常想知道“我应该使用哪种语料库,还是应该自己构建一个?如果我自己构建一个,我怎么知道自己做得好吗?”目前,他们所能获得的帮助很少。他们需要一个评估语料库的框架。我们针对语料库开发了这样一个框架,旨在很好地覆盖“通用语言”。我们设置的任务是自动创建发布质量搭配字典。对于100个捷克语的headwords和100个英语单词的样本,我们确定了一个黄金标准数据集(理想情况下),这些单词在此类词典中应出现的所有搭配。随本文件提供了数据集。然后,我们使用它们来确定一定范围的语料库的精确度和查全率,并带有一系列参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号