Semi-supervised document retrieval

Ming Li; Hang Li; Zhi-Hua Zhou

首页> 外文期刊>Information Processing & Management >Semi-supervised document retrieval

【24h】

Semi-supervised document retrieval

机译：半监督文件检索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSR_(ANK), aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for 1R proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unla-beled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.

机译：本文提出了一种新的机器学习方法，用于构造文档检索中的排名模型。该方法称为SSR_（ANK），旨在利用传统信息检索（IR）方法和最近提出的1R监督学习方法的优点。优点包括使用有限数量的标记数据和丰富的模型表示。为此，该方法在排名模型构建中采用了半监督学习框架。具体地，给定关于某些查询的少量标记文档，该方法有效地标记了用于查询的无标签文档。然后，它使用所有标记的数据来训练机器学习模型（在我们的例子中是神经网络）。在数据标记中，该方法还利用了传统的红外模型（在我们的情况下为BM25）。给出了基于机器学习理论的停止准则，用于数据标注过程。在相同数量的标记数据下，在三个基准数据集和一个Web搜索数据集上的实验结果表明，SSRank始终且几乎始终显着优于基线方法（无监督和有监督的学习方法）。这是因为SSRank在学习中可以有效利用未标记数据的使用。

著录项

来源
《Information Processing & Management》 |2009年第3期|341-355|共15页
作者
Ming Li; Hang Li; Zhi-Hua Zhou;
展开▼
作者单位

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;

Microsoft Research Asia, 49 Zhichun Road, Haidian District, Beijing 100080, China;

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
information retrieval; machine learning; data mining; learning to rank; semi-supervised learning;

机译：信息检索;机器学习数据挖掘;学习排名;半监督学习;
入库时间 2022-08-17 23:20:23

相似文献

外文文献
中文文献
专利

1. Semi-supervised ranking for document retrieval [J] . Kevin Dun, Katrin Kirchhoff Computer speech and language . 2011,第2期

机译：用于文件检索的半监督排序
2. Accreditation of public health agencies: a means, not an end. 773: id: 17101677 Error occurred: Document retrieval error: document is empty 774: id: 17101676 Error occurred: Document retrieval error: document is empty [J] . Russo P Journal of public health management and practice: JPHMP . 2007,第4期

机译：公共卫生机构的认证：一种手段，而非目的。 773：id：17101677发生错误：文档检索错误：文档为空774：id：17101676发生错误：文档检索错误：文档为空
3. Content-Based Document Image Retrieval Based on Document Modeling [J] . Shiah Chwan-Yi Journal of Intelligent Information Systems . 2020,第2期

机译：基于内容的文档图像检索基于文档建模
4. Visual Analytic System for Subject Matter Expert Document Tagging using Information Retrieval and Semi-Supervised Machine Learning [C] . Craig Hagerman, Richard Brath, Scott Langevin 2019 23rd International Conference Information Visualization . 2019

机译：使用信息检索和半监督机器学习对主题专家文档进行标记的视觉分析系统
5. Semi-supervised document clustering with active learning. [D] . Huang, Ruizhang. 2008

机译：具有主动学习功能的半监督文档聚类。
6. Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation [O] . Chao Wei, Senlin Luo, Xincheng Ma, 2011

机译：局部嵌入自动编码器：一种半监督的流形学习的文档表示形式
7. Semi-Supervised Document Retrieval [O] . Ming Li, Hang Li, Zhi-hua Zhou 2008

机译：半监督文件检索

Semi-supervised document retrieval

摘要

著录项

相似文献

相关主题

期刊订阅