首页> 外国专利> Machine-learned approach to determining document relevance for search over large electronic collections of documents

Machine-learned approach to determining document relevance for search over large electronic collections of documents

机译：机器学习的方法来确定文档相关性，以搜索大型电子文档集

页面导航

摘要
著录项
相似文献

摘要

The present invention relates to a system and methodology that applies automated learning procedures for determining document relevance and assisting information retrieval activities. A system is provided that facilitates a machine-learned approach to determine document relevance. The system includes a storage component that receives a set of human selected items to be employed as positive test cases of highly relevant documents. A training component trains at least one classifier with the human selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, wherein the other items can be selected by a statistical search, for example. Also, the trained classifier can be employed to aid an individual in identifying and selecting new positive cases or utilized to filter or re-rank results from a statistical-based search.

机译：本发明涉及一种系统和方法，该系统和方法应用自动学习过程来确定文档相关性并辅助信息检索活动。提供一种有助于机器学习的方法来确定文档相关性的系统。该系统包括一个存储组件，该组件接收一组人工选择的项目，以用作高度相关文档的肯定测试案例。训练组件训练至少一个分类器，其中将人类选择的项作为肯定测试用例，将一个或多个其他项作为否定测试用例，以提供独立于查询的模型，其中其他项可以通过统计搜索来选择，以用于例。同样，训练有素的分类器可用于帮助个人识别和选择新的阳性病例，或用于对基于统计的搜索结果进行过滤或重新排序。

著录项

公开/公告号US7287012B2

专利类型
公开/公告日2007-10-23

原文格式PDF
申请/专利权人 SIMON H. CORSTON;RAMAN CHANDRASEKAR;HARR CHEN;
展开▼

申请/专利号US20040754159
发明设计人 SIMON H. CORSTON;RAMAN CHANDRASEKAR;HARR CHEN;
展开▼

申请日2004-01-09
分类号G06F15/18;G06F17/00;
国家 US
入库时间 2022-08-21 21:02:36

相似文献

专利
外文文献
中文文献