Graph-based term weighting for information retrieval

Roi Blanco; Christina Lioma

首页> 外文期刊>Information retrieval >Graph-based term weighting for information retrieval

【24h】

Graph-based term weighting for information retrieval

机译：基于图的词权重信息检索

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A standard approach to Information Retrieval (IR) is to model text as a bag of words. Alternatively, text can be modelled as a graph, whose vertices represent words, and whose edges represent relations between the words, defined on the basis of any meaningful statistical or linguistic relation. Given such a text graph, graph theoretic computations can be applied to measure various properties of the graph, and hence of the text. This work explores the usefulness of such graph-based text representations for IR. Specifically, we propose a principled graph-theoretic approach of (1) computing term weights and (2) integrating discourse aspects into retrieval. Given a text graph, whose vertices denote terms linked by co-occurrence and grammatical modification, we use graph ranking computations (e.g. PageRank Page et al. in The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998) to derive weights for each vertex, i.e. term weights, which we use to rank documents against queries. We reason that our graph-based term weights do not necessarily need to be normalised by document length (unlike existing term weights) because they are already scaled by their graph-ranking computation. This is a departure from existing IR ranking functions, and we experimentally show that it performs comparably to a tuned ranking baseline, such as BM25 (Robertson et al. in NIST Special Publication 500-236: TREC-4, 1995). In addition, we integrate into ranking graph properties, such as the average path length, or clustering coefficient, which represent different aspects of the topology of the graph, and by extension of the document represented as a graph. Integrating such properties into ranking allows us to consider issues such as discourse coherence, flow and density during retrieval. We experimentally show that this type of ranking performs comparably to BM25, and can even outperform it, across different TREC (Voorhees and Harman in TREC: Experiment and evaluation in information retrieval, MIT Press, 2005) datasets and evaluation measures.

机译：信息检索（IR）的标准方法是将文本建模为一袋单词。替代地，可以将文本建模为图形，其顶点表示单词，并且其边缘表示单词之间的关系，该关系基于任何有意义的统计或语言关系来定义。给定这样的文本图，可以将图理论计算应用于测量图的各种属性，从而测量文本的各种属性。这项工作探索这种基于图形的文本表示形式对IR的有用性。具体来说，我们提出了一种原则上的图论方法：（1）计算术语权重，（2）将话语方面整合到检索中。给定一个文本图，该图的顶点表示通过共现和语法修改链接的术语，我们使用图排名计算（例如，PageRank Page等，在pagerank引用排名：将顺序放到网络上。技术报告，斯坦福数字图书馆技术项目（1998年），以得出每个顶点的权重，即术语权重，我们将其用于根据查询对文档进行排名。我们认为，基于图的术语权重不一定需要通过文档长度进行归一化（不同于现有的术语权重），因为它们已经通过其图排名计算进行了缩放。这与现有的IR排名功能背道而驰，我们通过实验证明它的性能与调整后的排名基线（例如BM25）相当（Robertson等人，NIST Special Publication 500-236：TREC-4，1995）。此外，我们还集成了排名的图属性，例如平均路径长度或聚类系数，它们表示图拓扑的不同方面，并通过扩展表示为图的文档。将这些属性整合到排名中后，我们就可以考虑检索过程中的语篇连贯性，流程和密度等问题。我们通过实验表明，在不同的TREC（TREC中的Voorhees和Harman：信息检索中的实验和评估，麻省理工学院出版社，2005年）数据集和评估措施中，这种排名方式与BM25相当，甚至可以胜过BM25。

著录项

来源
《Information retrieval》 |2012年第1期|p.54-92|共39页
作者
Roi Blanco; Christina Lioma;
展开▼
作者单位

Computer Science Department, University of A Coruiia, A Coruna, Spain;

Computer Science Department, Stuttgart University, Stuttgart, Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
information retrieval; graph theory; natural language processing;

机译：信息检索;图论自然语言处理;

相似文献

外文文献
中文文献
专利

1. Multi term based co-term frequency method for term weighting in information retrieval [J] . M. Santhanakumar, C. Christopher Columbus, K. Jayapriya International journal of business information systems . 2018,第1期

机译：基于多词共项频率的信息检索词权重方法
2. Comparison of Graph-based and Term Weighting Method for Automatic Summarization of Online News [J] . Reinert Yosua Rumagit, Nina Setiyawati, Dwi Hosanna Bangkalang Procedia Computer Science . 2019,第22期

机译：基于图和术语加权的在线新闻自动汇总方法比较
3. A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query terms [J] . Arslan Ahmet, Dincer Bekir Taner Information retrieval . 2019,第6期

机译：基于查询术语频率分布的稳健信息检索的索引术语加权的选择性方法
4. Graph-Based Term Weighting Scheme for Topic Modeling [C] . Giannis Bekoulis, François Rousseau IEEE International Conference on Data Mining Workshops . 2016

机译：基于图的主题建模术语加权方案
5. Structural information based term weighting in text retrieval for feature location [D] . Bassett, Richard B. 2013

机译：基于结构信息的术语权重在文本检索中进行特征定位
6. A Part-Of-Speech Term Weighting Scheme for Biomedical Information Retrieval [O] . Yanshan Wang, Stephen Wu, Dingcheng Li, -1

机译：生物医学信息检索的词性项加权算法
7. Graph-based term weighting for information retrieval [O] . Roi Blanco, Christina Lioma 2012

机译：用于信息检索的基于图形的术语加权
8. Improve Precategorized Collection Retrieval by Using Supervised Term Weighting Schemes. [R] . Zhao, Y., Karypis, G. 2001

机译：利用监督期限加权方案改进预分类收集检索。

Graph-based term weighting for information retrieval

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅