An Ontology-Based Binary-Categorization Approach for Recognizing Multiple-Record Web Documents Using a Probabilistic Retrieval Model

QUAN WANG; YIU-KAI NG

首页> 外文期刊>Information retrieval >An Ontology-Based Binary-Categorization Approach for Recognizing Multiple-Record Web Documents Using a Probabilistic Retrieval Model

【24h】

An Ontology-Based Binary-Categorization Approach for Recognizing Multiple-Record Web Documents Using a Probabilistic Retrieval Model

机译：基于本体的二进制分类方法，利用概率检索模型识别多记录Web文档

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Web contains a tremendous amount of information. It is challenging to determine which Web documents are relevant to a user query, and even more challenging to rank them according to their degrees of relevance. In this paper, we propose a probabilistic retrieval model using logistic regression for recognizing multiple-record Web documents against an application ontology, a simple conceptual modeling approach. We notice that many Web documents contain a sequence of chunks of textual information, each of which constitutes a "record." This type of documents is referred to as multiple-record documents. In our categorization approach, a document is represented by a set of term frequencies of index terms, a density heuristic value, and a grouping heuristic value. We first apply the logistic regression analysis on relevant probabilities using the (ⅰ) index terms, (ⅱ) density value, and (ⅲ) grouping value of each training document. Hereafter, the relevant probability of each test document is interpolated from the fitting curves. Contrary to other probabilistic retrieval models, our model makes only a weak independent assumption and is capable of handling any important dependent relationships among index terms. In addition, we use logistic regression, instead of linear regression analysis, because the relevance probabilities of training documents are discrete. Using a test set of car-ads and another one for obituary Web documents, our probabilistic model achieves the averaged recall ratio of 100%, precision ratio of 83.3%, and accuracy ratio of 92.5%.

机译：Web包含大量信息。确定哪些Web文档与用户查询相关具有挑战性，并且根据它们的相关程度对它们进行排名更具挑战性。在本文中，我们提出了一种使用逻辑回归的概率检索模型，用于根据应用程序本体识别多记录Web文档，这是一种简单的概念建模方法。我们注意到许多Web文档包含一系列文本信息，每个文本信息构成一个“记录”。这种类型的文档称为多记录文档。在我们的分类方法中，文档由一组索引词的词频，密度启发式值和分组启发式值表示。我们首先使用每个训练文档的（ⅰ）索引项，（ⅱ）密度值和（ⅲ）分组值对相关概率进行逻辑回归分析。此后，从拟合曲线内插每个测试文件的相关概率。与其他概率检索模型相反，我们的模型仅做出一个微弱的独立假设，并且能够处理索引词之间的任何重要依赖关系。此外，由于训练文档的相关概率是离散的，因此我们使用逻辑回归而不是线性回归分析。通过使用一组汽车测试和另一组用于for告Web文档的测试，我们的概率模型实现了平均召回率100％，准确率83.3％和准确率92.5％。

著录项

来源
《Information retrieval》 |2003年第4期|p.295-332|共38页
作者
QUAN WANG; YIU-KAI NG;
展开▼
作者单位

Computer Science Department, Brigham Young University, Provo, Utah 84602, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
probabilistic model; logistic regression analysis; application ontology; binary categorization;

机译：概率模型;逻辑回归分析;应用本体;二元分类;

相似文献

外文文献
中文文献
专利

1. Performing Binary-Categorization on Multiple-Record Web Documents Using Information Retrieval Models and Application Ontologies [J] . LINUS W. KWONG, YIU-KAING World Wide Web . 2003,第3期

机译：使用信息检索模型和应用程序本体对多记录Web文档执行二进制分类
2. A novel approach for ontology-based dimensionality reduction for web text document classification [J] . Elhadad Mohamed K., Badran Khaled Shafee S., Salama Gouda I. International journal of software innovation . 2017,第4期

机译：基于本体的Web文本文档分类降维的新方法
3. AN ONTOLOGY-BASED INTELLIGENT INFORMATION RETRIEVAL METHOD FOR DOCUMENT RETRIEVAL [J] . POONAM YADAV, R.P. SINGH International Journal of Engineering Science and Technology . 2012,第9期

机译：基于本体的文档检索智能信息检索方法
4. A binary-categorization approach for classifying multiple-record Web documents using application ontologies and a probabilistic model [C] . Yiu-Kai Ng, Tang, J., . 2001

机译：使用应用程序本体和概率模型对多记录Web文档进行分类的二进制分类方法
5. An ontology-based application to detect, annotate and search Web documents: First results. [D] . Husain, Jawad Bin. 2004

机译：基于本体的应用程序，用于检测，注释和搜索Web文档：第一个结果。
6. Integrating Multiple Models Using Image-as-Documents Approach for Recognizing Fine-Grained Home Contexts [O] . Sinan Chen, Sachio Saiki, Masahide Nakamura 2020

机译：使用文档图像方法集成多种模型以识别细粒度的家庭环境
7. Recognizing ontology-applicable multiple-record Web documents [O] . D. W. Embley, Y. -k. Ng, L. Xu 2001

机译：识别本体适用的多记录Web文档

An Ontology-Based Binary-Categorization Approach for Recognizing Multiple-Record Web Documents Using a Probabilistic Retrieval Model

摘要

著录项

相似文献

相关主题

期刊订阅