首页> 外国专利> SYSTEMS AND METHODS FOR A SCALABLE CONTINUOUS ACTIVE LEARNING APPROACH TO INFORMATION CLASSIFICATION

SYSTEMS AND METHODS FOR A SCALABLE CONTINUOUS ACTIVE LEARNING APPROACH TO INFORMATION CLASSIFICATION

机译:用于信息分类的可扩展的连续主动学习方法的系统和方法

摘要

Systems and methods for classifying electronic information are provided by way of a Technology-Assisted Review (“TAR”) process. In certain embodiments, the TAR process is a Scalable Continuous Active Learning (“S-CAL”) approach. In certain embodiments, S-CAL selects an initial sample from a document collection, trains a classifier by using a default classification for a portion of the initial sample, scores the initial sample, selects a sub-sample from the initial sample for review, removes the reviewed sub-sample from the initial sample, and repeats the process by re-training the classifier until the initial sample is exhausted. In certain embodiments, a classification threshold is determined using a calculated estimate of the prevalence of relevant information such that the threshold classifies the information in accordance with a determined target criteria. In certain embodiments, the estimate of prevalence is determined from the results of iterations of a TAR process such as S-CAL.
机译:通过技术辅助审查(“ TAR”)过程提供了对电子信息进行分类的系统和方法。在某些实施例中,TAR过程是可扩展的连续主动学习(“ S-CAL”)方法。在某些实施例中,S-CAL从文档集合中选择初始样本,通过使用默认分类来训练初始样本的一部分,从而对分类器进行训练,对初始样本进行评分,从初始样本中选择子样本进行审核,然后删除从原始样本中检查的子样本,并通过重新训练分类器重复此过程,直到耗尽原始样本为止。在某些实施例中,使用计算出的相关信息的普遍性的估计来确定分类阈值,使得该阈值根据确定的目标标准对信息进行分类。在某些实施例中,从诸如S-CAL的TAR过程的迭代结果确定患病率的估计。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号