首页> 美国卫生研究院文献>other >A structured approach to predictive modeling of a two-class problem using multidimensional data sets
【2h】

A structured approach to predictive modeling of a two-class problem using multidimensional data sets

机译:使用多维数据集预测两班问题的结构化方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Biological experiments in the post-genome era can generate a staggering amount of complex data that challenges experimentalists to extract meaningful information. Increasingly, the success of an appropriately controlled experiment relies on a robust data analysis pipeline. In this paper, we present a structured approach to the analysis of multidimensional data that relies on a close, two-way communication between the bioinformatician and experimentalist. A sequential approach employing data exploration (visualization, graphical and analytical study), pre-processing, feature reduction and supervised classification using machine learning is presented. This standardized approach is illustrated by an example from a proteomic data analysis that has been used to predict the risk of infectious disease outcome. Strategies for model selection and post-hoc model diagnostics are presented and applied to the case illustration. We discuss some of the practical lessons we have learned applying supervised classification to multidimensional data sets, one of which is the importance of feature reduction in achieving optimal modeling performance.
机译:后基因组时代的生物实验会产生数量惊人的复杂数据,这将挑战实验学家提取有意义的信息。适当控制的实验能否成功越来越依赖可靠的数据分析管道。在本文中,我们提出了一种结构化的多维数据分析方法,该方法依赖于生物信息学家和实验者之间的密切双向交流。介绍了一种采用数据探索(可视化,图形和分析研究),预处理,特征约简和使用机器学习进行监督分类的顺序方法。蛋白质组学数据分析中的一个例子说明了这种标准化方法,该分析已用于预测传染病预后的风险。提出了模型选择和事后模型诊断的策略,并将其应用于案例说明。我们讨论了一些经验教训,这些经验教训是将监督分类应用于多维数据集,其中之一是减少特征对实现最佳建模性能的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号