Document clustering of scientific texts using citation contexts

Bader Aljaber; Nicola Stokes; James Bailey; Jian Pei

首页> 外文期刊>Information retrieval >Document clustering of scientific texts using citation contexts

【24h】

Document clustering of scientific texts using citation contexts

机译：使用引用上下文对科学文本进行文档聚类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Document clustering has many important applications in the area of data mining and information retrieval. Many existing document clustering techniques use the "bag-of-words" model to represent the content of a document. However, this representation is only effective for grouping related documents when these documents share a large proportion of lexically equivalent terms. In other words, instances of synonymy between related documents are ignored, which can reduce the effectiveness of applications using a standard full-text document representation. To address this problem, we present a new approach for clustering scientific documents, based on the utilization of citation contexts. A citation context is essentially the text surrounding the reference markers used to refer to other scientific works. We hypothesize that citation contexts will provide relevant synonymous and related vocabulary which will help increase the effectiveness of the bag-of-words representation. In this paper, we investigate the power of these citation-specific word features, and compare them with the original document's textual representation in a document clustering task on two collections of labeled scientific journal papers from two distinct domains: High Energy Physics and Genomics. We also compare these text-based clustering techniques with a link-based clustering algorithm which determines the similarity between documents based on the number of co-citations, that is in-links represented by citing documents and out-links represented by cited documents. Our experimental results indicate that the use of citation contexts, when combined with the vocabulary in the full-text of the document, is a promising alternative means of capturing critical topics covered by journal articles. More specifically, this document representation strategy when used by the clustering algorithm investigated in this paper, outperforms both the full-text clustering approach and the link-based clustering technique on both scientific journal datasets.

机译：文档聚类在数据挖掘和信息检索领域具有许多重要的应用。许多现有的文档聚类技术都使用“词袋”模型来表示文档的内容。但是，仅当这些文档共享大量词法等效术语时，此表示方式才对这些文档进行分组有效。换句话说，相关文档之间的同义词实例将被忽略，这会降低使用标准全文文档表示形式的应用程序的效率。为了解决这个问题，我们提出了一种基于引用上下文的聚类科学文献的新方法。引用上下文实质上是围绕用于参考其他科学作品的参考标记的文字。我们假设引用上下文将提供相关的同义词和相关的词汇，这将有助于提高词袋表示的有效性。在本文中，我们研究了这些特定于引文的单词功能的强大功能，并将它们与原始文档的文本表示形式进行了比较，该文档聚类任务来自两个不同领域：高能物理和基因组学。我们还将这些基于文本的聚类技术与基于链接的聚类算法进行比较，该算法基于共引用的数量确定文档之间的相似性，即引用文档代表的入站链接和引用文档代表的出站链接。我们的实验结果表明，将引文上下文与文档全文中的词汇结合使用时，是捕捉期刊文章涵盖的关键主题的一种有前途的替代方法。更具体地说，当本文研究的聚类算法使用该文档表示策略时，它在两个科学期刊数据集上均优于全文聚类方法和基于链接的聚类技术。

著录项

来源
《Information retrieval 》 |2010年第2期| p.101-131| 共31页
作者
Bader Aljaber; Nicola Stokes; James Bailey; Jian Pei;
展开▼
作者单位

Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia;

School of Computer Science and Informatics, University College Dublin, Dublin, Ireland;

NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia;

School of Computing Science, Simon Fraser University, Burnaby, Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
citation contexts; document clustering; text categorization;

机译：引文上下文文档聚类;文字分类;

相似文献

外文文献
中文文献
专利

1. Analysis of In text Citation Patterns in Local Journals for Ranking Scientific Documents [J] . Yaniasih Yaniasih, Indra Budi DESIDOC Journal of Library & Information Technology . 2021 ,第2期

机译：评级科学文件中当地期刊文本引文模式分析
2. Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution Citation frequency of individual articles in other papers more fairly measures their scientific contribution than mere presence in reference lists [J] . Hou Wen-Ru, Li Ming, Niu Deng-Ke . BioEssays : . 2011 ,第10期

机译：在文本而不是参考文献列表中计算引文数量，以提高评估科学贡献的准确性。其他论文中每篇文章的被引频次更公平地衡量其科学贡献，而不仅仅是在参考文献列表中
3. Scientific Documents clustering based on Text Summarization [J] . Pedram Vahdani Amoli, Omid Sojoodi Sh. International Journal of Electrical and Computer Engineering . 2015 ,第4期

机译：基于文本摘要的科学文献聚类
4. Let's Summarize Scientific Documents!A Clustering-Based Approach via Citation Context [C] . Santosh Kumar Mishra, Naveen Saini, Sriparna Saha, International Conference on Applications of Natural Language to Information Systems . 2021

机译：让我们总结科学文档！通过引文上下文基于聚类的方法
5. Citation handling: Processing citation texts in scientific documents. [D] . Whidby, Michael Alan. 2012

机译：引文处理：处理科学文献中的引文。
6. Interpersonal Synchrony in the Context of Caregiver-Child Interactions: A Document Co-citation Analysis [O] . Alessandro Carollo, Mengyu Lim, Vahid Aryadoust, 2021

机译：照顾者儿童互动背景下的人际关系同步：文件共同分析
7. PDF (40 K) View thumbnail images View full size images Add to my quick links Cited by E-mail article Save as citation alert Export citation + link Set up a citation RSS feed (Opens new window) Related Articles in ScienceDirect Contents of volume 154 Physics of The Earth and Planetary Interiors Close You are entitled to access the full text of this document Contents of volume 154 Physics of The Earth and Planetary Interiors, Volume 154, Issues 3-4, 16 March 2006, Pages 350-351 PDF (25 K) Special issue contents page Physics of The Earth and Planetary Interiors Close You are entitled to access the full text of this document Special issue contents page Physics of The Earth and Planetary Interiors, Volume 154, Issues 3-4, 16 March 2006, Page iv PDF (22 K) View More Related Articles Bookmark and share in 2collab (opens in new window) Request permission to reuse this article View Record in Scopus Cited By in Scopus (0) doi:10.1016/j.pepi.2005.12.002 How to Cite or Link Using DOI (Opens New Window) Copyright © 2006 Elsevier B.V. All rights reserved. Preface [O] . Lagroix France, Muxworthy Adrian, Hoffmann Viktor 2006

机译：PDF（40 K）查看缩略图查看全尺寸图像添加到我的快速链接被电子邮件引用引用另存为引用警报导出引用+链接设置引用RSS提要（打开新窗口）ScienceDirect中的相关文章第154卷的内容地球和行星内部物理学您有权访问本文档的全文。第154卷的内容2006年3月16日，第154卷，第3-4期，第154卷，第350-351页PDF（25 K）特刊内容页地球和行星内饰关闭您有权访问本文档的全文特别发行内容页面地球与行星内饰物理，第154卷，第3-4期，2006年3月16日，第iv PDF（22 K）查看更多相关文章在2collab中添加书签并共享（在新窗口中打开）请求重新使用本文的权限在Scopus中查看记录在Scopus中被引用（0）doi：10.1016 / j.pepi.2005.12.002如何使用DOI进行引用或链接（打开新窗口）版权所有©2006 Elsevier B .V。保留所有权利。前言

Document clustering of scientific texts using citation contexts

摘要

著录项

相似文献

相关主题

期刊订阅