首页> 外文会议>Computational linguistics and intelligent text processing >The Influence of Collocation Segmentation and Top 10 Items to Keyword Assignment Performance
【24h】

The Influence of Collocation Segmentation and Top 10 Items to Keyword Assignment Performance

机译:搭配细分和前10个项目对关键字分配效果的影响

获取原文
获取原文并翻译 | 示例

摘要

Automatic document annotation from a controlled conceptual thesaurus is useful for establishing precise links between similar documents. This study presents a language independent document annotation system based on features derived from a novel collocation segmentation method. Using the multilingual conceptual thesaurus Euro Voc, we evaluate filtered and unfiltered version of the method, comparing it against other language independent methods based on single words and bigrams. Testing our new method against the manually tagged multilingual corpus Acquis Communautaire 3.0 (AC) using all descriptors found there, we attain improvements in keyword assignment precision from 18 to 29 percent and in F-measure from 17.2 to 27.6 for 5 keywords assigned to a document. The further filtering out of the top 10 frequent items improves precision by 4 percent and collocation segmentation improves precision by 9 percent on the average, over 21 languages tested.
机译:来自受控概念词库的自动文档注释对于在相似文档之间建立精确链接非常有用。这项研究提出了一种基于语言的独立文档批注系统,该系统基于从新颖的搭配分割方法派生的特征。使用多语言概念词库Euro Voc,我们评估了该方法的过滤版本和未过滤版本,并将其与基于单个单词和双字母组的其他独立于语言的方法进行了比较。使用此处找到的所有描述符,针对手动标记的多语言语料库Acquis Communautaire 3.0(AC)测试我们的新方法,对于分配给文档的5个关键字,我们将关键字分配精度从18%提高到29%,F度量从17.2提高到27.6 。在测试的21种语言中,对前10个常见项目的进一步过滤将精度提高了4%,并置细分平均将精度提高了9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号