首页> 美国政府科技报告 >Statistical Methods for Technical Document Retrieval.
【24h】

Statistical Methods for Technical Document Retrieval.

机译:技术文献检索的统计方法。

获取原文

摘要

The RADC Automatic Document Classification On-Line (RADCOL) system is a tool for testing various statistical procedures for document analysis and retrieval, and for the design of operational systems. This report describes experiments which used the RADCOL system; it was found, as had been predicted, that procedures for clustering word stems did not provide substantial savings in space and time, and that an unclustered thesaurus gave improved retrieval capabilities. Three new versions of the system were implemented, with weights of 0.0, 0.5, and 1.0 assigned to identity correlations (correlations of word stems with themselves). Because of superior performance of the system using 1.0 correlations, a simplified version of the retrieval technique was recommended for use with science and technology abstracts. In the simplified system, automatic thesaurus generation would be eliminated, and a large technical vocabulary would be used. Retrievals would use direct correlations between queries and documents. These experiments are believed to be the most comprehensive series of tests of statistical retrieval methods performed on a data base of realistic size. Further experimentation is recommended to determine the applicability of statistical methods to other types of intelligence data bases and user requirements.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号