An interpretation of index term weighting schemes based on document components

机译：基于文档成分的索引词加权方案的解释

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A theory of indexing is presented and is based on viewing a document as constituted of components. A component may be chosen as any run of text unit that can be: (a) judged as to its relevancy property; and (b) considered as independent within the document. By looking at the constituent components of a document in relation to the universe of all components from the collection, we have been able to apply Bayes' decision theory to derive the index term representation for the document, as well as attaching an initial probabilistic weight for each term based on a Principle of Document Self-Recovery. It turns out that different choices of document components, such as a word or a whole abstract, can lead to different term weighting schemes that have been introduced before and are based on probability considerations; specifically, Edmundson and Wyllys' term significance formula, Sparck Jones' inverse document frequency, and later modified by Croft and Harper into the 'combination match' formula. Thus,a unified interpretation of various probabilistic term weighting schemes appears possible.

机译：提出了一种建立索引的理论，该理论基于将文档视为由组件组成的情况。可以选择一个组件作为可以满足以下条件的任意文本单元：（a）判断其相关性; （b）在文件内被认为是独立的。通过查看与集合中所有组件的整体有关的文档的组成组件，我们已经能够应用贝叶斯决策理论来导出该文档的索引项表示形式，并为该组件附加初始概率权重每个术语均基于文档自我恢复原则。事实证明，对文档组成部分（例如单词或整个摘要）的不同选择会导致以前引入的且基于概率考虑因素的不同术语加权方案;具体来说，是埃德蒙森（Edmundson）和威利斯（Wyllys）的术语显着性公式，斯帕克琼斯（Sparck Jones）的逆文档频率，后来又由克罗夫特（Croft）和哈珀（Harper）修改为“组合匹配”公式。因此，可以统一解释各种概率术语加权方案。

著录项

来源
《Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval》|1986年|P.275-283|共9页
会议地点
作者
K. L. Kwok;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms [J] . Lakshmi R., Baskar S. Expert Systems with Application . 2019,第DECa期

机译：基于术语排序和含术语语义关系的模糊逻辑的文档表示新术语加权方案
2. Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval [J] . Hui Zhang, Deqing Wang, Wenjun Wu, Enterprise information systems . 2012,第4期

机译：术语频率-文档频率的功能：企业信息检索的新术语加权方案
3. The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization [J] . Sanchez-Gomez Jesus M., Vega-Rodriguez Miguel A., Perez Carlos J. Expert systems with applications . 2021,第May期

机译：术语加权计划和相似性措施对提取多文件文本摘要的影响
4. A new term weighting scheme based on class specific document frequency for document representation and classification [C] . Plansangket Suthira, Gan John Q. Computer Science and Electronic Engineering Conference . 2015

机译：一种基于类别特定文档频率的新术语加权方案，用于文档表示和分类
5. A single document-based term weighting scheme by supporting terms. [D] . Cheng, Juan. 2006

机译：通过支持术语的单个基于文档的术语加权方案。
6. A Part-Of-Speech Term Weighting Scheme for Biomedical Information Retrieval [O] . Yanshan Wang, Stephen Wu, Dingcheng Li, -1

机译：生物医学信息检索的词性项加权算法
7. An Approach for Combining Multiple Weighting Schemes and Ranking Methods in Graph-Based Multi-Document Summarization [O] . Abeer Alzuhair, Mohammed Al-Dhelaan 2019

机译：一种基于图形的多文件概述中的多重加权方案和排序方法的方法
8. Improve Precategorized Collection Retrieval by Using Supervised Term Weighting Schemes. [R] . Zhao, Y., Karypis, G. 2001

机译：利用监督期限加权方案改进预分类收集检索。

An interpretation of index term weighting schemes based on document components

摘要

著录项

相似文献

相关主题

期刊订阅