...
首页> 外文期刊>Bioinformatics >How many samples are needed to build a classifier: a general sequential approach
【24h】

How many samples are needed to build a classifier: a general sequential approach

机译:建立分类器需要多少个样本:一般的顺序方法

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: The standard paradigm for a classifier design is to obtain a sample of feature-label pairs and then to apply a classification rule to derive a classifier from the sample data. Typically in laboratory situations the sample size is limited by cost, time or availability of sample material. Thus, an investigator may wish to consider a sequential approach in which there is a sufficient number of patients to train a classifier in order to make a sound decision for diagnosis while at the same time keeping the number of patients as small as possible to make the studies affordable.Results: A sequential classification procedure is studied via the martingale central limit theorem. It updates the classification rule at each step and provides stopping criteria to ensure with a certain confidence that at stopping a future subject will have misclassification probability smaller than a predetermined threshold. Simulation studies and applications to microarray data analysis are provided. The procedure possesses several attractive properties: (1) it updates the classification rule sequentially and thus does not rely on distributions of primary measurements from other studies; (2) it assesses the stopping criteria at each sequential step and thus can substantially reduce cost via early stopping; and (3) it is not restricted to any particular classification rule and therefore applies to any parametric or non-parametric method, including feature selection or extraction.
机译:动机:分类器设计的标准范例是获取特征标签对的样本,然后应用分类规则从样本数据中得出分类器。通常,在实验室情况下,样品大小受成本,时间或样品材料可用性的限制。因此,研究人员可能希望考虑一种顺序方法,在该方法中,有足够的患者数量来训练分类器,以便做出合理的诊断决策,同时保持尽可能少的患者数量,以使诊断更为准确。结果:通过the中心极限定理研究了顺序分类程序。它在每个步骤更新分类规则,并提供停止标准,以确保一定的信心,以确保将来的受试者在进行错误分类时的误分类概率小于预定阈值。提供了模拟研究及其在微阵列数据分析中的应用。该程序具有几个吸引人的特性:(1)它顺序更新分类规则,因此不依赖于其他研究的主要测量值的分布; (2)它评估每个顺序步骤的停止标准,从而可以通过尽早停止而大大降低成本; (3)它不限于任何特定的分类规则,因此适用于任何参数或非参数方法,包括特征选择或提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号