首页> 外文会议>Digital Libraries, 2006. JCDL '06 >A comparative study of citations and links in document classification
【24h】

A comparative study of citations and links in document classification

机译:文献分类中引文和链接的比较研究

获取原文

摘要

It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37% over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. Thiscombination is based on the notion of classifier reliability and presented gains of up to 14% in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.
机译:众所周知,链接是处理Web集合时的重要信息来源。但是,问题仍然在于,是否可以将Web上使用的相同技术应用于包含科学论文之间的引文的文档集合。在这项工作中,我们将在自动文本分类的背景下,对数字图书馆引文和Web链接进行比较研究。我们证明,在这种情况下,引文和链接之间实际上存在差异。为了进行比较,我们使用计算机科学论文数字图书馆和Web目录进行了一系列实验。在我们的参考文献集中,基于共引用的度量在Web目录中的页面上往往表现更好,比基于文本的分类器高37%,而基于书目耦合的度量在数字图书馆中表现更好。我们还提出了一种简单有效的方法,将传统的基于文本的分类器与基于引文链接的分类器结合在一起。这种组合基于分类器可靠性的概念,并在Web集合的微平均F1中显示高达14%的增益。但是,在数字图书馆中没有获得可观的收益。最后,进行了一项用户研究,以进一步调查导致这些结果的原因。我们发现基于引文链接的分类器进行的错误分类实际上是困难的情况,即使对于人类也难以分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号