首页> 外文学位 >Comparative study of pattern recognition, neural network and statistical regression approaches to information retrieval.
【24h】

Comparative study of pattern recognition, neural network and statistical regression approaches to information retrieval.

机译:模式识别,神经网络和统计回归方法进行信息检索的比较研究。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation presents several new retrieval methods that combine the use of Bayes' theorem and probability density estimation techniques. The new methods estimate probability of relevance from a small set of statistical features characterizing document-query pairs, such as query length, within-document term frequency, the number of matching terms between a document and a query, and the like.; The central task of computing the probability of relevance in the proposed methods is to infer the density functions of the feature vector in the relevant and irrelevant classes from training examples. Both parametric and non-parametric methods are employed to estimate the density function from training examples.; A two-layer neural network is presented. It takes as input a feature vector representing a document-query pair and returns the probability of relevance. Simple and complex neural networks are compared for retrieval performance, and the results show that more complex design do not outperform significantly the simplest design.; The performances of seven retrieval methods are compared with each other. The seven retrieval methods are: linear discriminant, quadratic discriminant, k-nearest neighbor, kernel method, neural network, linear regression, and logistic regression. All seven retrieval methods are trained on a common training set and then are applied to two large test sets, the TREC-5 test set and the TREC-6 test set.; The experimental results suggest that the seven retrieval methods may be divided into two groups. The first group consists of the logistic regression, linear regression, linear discriminant, and neural network retrieval methods, whereas the second group consists of the quadratic discriminant, k-nearest neighbor, and the kernel method. The retrieval methods within the first group perform approximately equally well on the test sets. Furthermore, any method in the first group outperforms any method in the second group. In addition to being less effective in retrieval, both the kernel method and the k-nearest neighbor method are computationally intensive.
机译:本文提出了几种结合贝叶斯定理和概率密度估计技术的新检索方法。新方法从表征文档-查询对的一小套统计特征(例如查询长度,文档内术语频率,文档与查询之间的匹配术语数等)中估计相关概率。在提出的方法中计算相关概率的中心任务是从训练示例中推断相关和不相关类中特征向量的密度函数。参数和非参数方法都被用来从训练实例中估计密度函数。提出了一个两层神经网络。它以代表文档查询对的特征向量为输入,并返回相关概率。比较了简单和复杂的神经网络的检索性能,结果表明,更复杂的设计不会明显优于最简单的设计。比较了这7种检索方法的性能。七个检索方法是:线性判别,二次判别,k最近邻,核方法,神经网络,线性回归和逻辑回归。所有七个检索方法都在一个通用的训练集上进行训练,然后应用于两个大型测试集,即TREC-5测试集和TREC-6测试集。实验结果表明,这七个检索方法可以分为两组。第一组由逻辑回归,线性回归,线性判别和神经网络检索方法组成,而第二组由二次判别,k最近邻和核方法组成。第一组中的检索方法在测试集上的表现大致相同。此外,第一组中的任何方法都优于第二组中的任何方法。除了检索效率较低外,核方法和k最近邻方法都需要大量计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号