首页> 外文会议>Pacific Association for Computational Linguistics Conference(PACLING'03); 20030822-25; Halifax(CA) >AUTOMATIC TERM EXTRACTION AND DOCUMENT SIMILARITY IN SPECIAL TEXT CORPORA
【24h】

AUTOMATIC TERM EXTRACTION AND DOCUMENT SIMILARITY IN SPECIAL TEXT CORPORA

机译:特殊文本公司中的自动术语提取和文档相似性

获取原文
获取原文并翻译 | 示例

摘要

This paper confirms that the performance of a state-of-the-art automatic term extraction method on a computer science corpus is similar to previously published performance data on a medical corpus. The extracted terms are then used to estimate the similarity of papers in the computer science corpus using the standard Vector Space Model. The precision of retrieval using a term-based representation is compared with that of a word-based representation, and a link-based similarity metric based on the overlap of the local neighborhoods of the papers in the citation graph. The term-based approach offers comparable performance to the word-based approach, but potentially with a much smaller vocabulary size.
机译:本文确认,计算机科学语料库上最先进的自动术语提取方法的性能类似于先前发布的医学语料库上的性能数据。然后使用提取的项使用标准向量空间模型来估计计算机科学语料库中论文的相似性。将使用基于术语的表示形式的检索精度与基于单词的表示形式的检索精度以及基于引文图中论文的局部邻域的重叠的基于链接的相似性度量进行比较。基于术语的方法可以提供与基于单词的方法相当的性能,但潜在的词汇量却要小得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号