...
首页> 外文期刊>Methods: A Companion to Methods in Enzymology >A structured approach to predictive modeling of a two-class problem using multidimensional data sets
【24h】

A structured approach to predictive modeling of a two-class problem using multidimensional data sets

机译:一种使用多维数据集对两类问题进行预测建模的结构化方法

获取原文
获取原文并翻译 | 示例
           

摘要

Biological experiments in the post-genome era can generate a staggering amount of complex data that challenges experimentalists to extract meaningful information. Increasingly, the success of an appropriately controlled experiment relies on a robust data analysis pipeline. In this paper, we present a structured approach to the analysis of multidimensional data that relies on a close, two-way communication between the bioinformatician and experimentalist. A sequential approach employing data exploration (visualization, graphical and analytical study), pre-processing, feature reduction and supervised classification using machine learning is presented. This standardized approach is illustrated by an example from a proteomic data analysis that has been used to predict the risk of infectious disease outcome. Strategies for model selection and post hoc model diagnostics are presented and applied to the case illustration. We discuss some of the practical lessons we have learned applying supervised classification to multidimensional data sets, one of which is the importance of feature reduction in achieving optimal modeling performance.
机译:后基因组时代的生物实验会产生惊人数量的复杂数据,这将挑战实验学家提取有意义的信息。适当控制的实验能否成功越来越依赖可靠的数据分析管道。在本文中,我们提出了一种结构化的多维数据分析方法,该方法依赖于生物信息学家和实验者之间的密切双向交流。提出了一种采用数据探索(可视化,图形和分析研究),预处理,特征约简和使用机器学习进行监督分类的顺序方法。蛋白质组学数据分析中的一个例子说明了这种标准化方法,该分析已用于预测传染病预后的风险。提出了模型选择和事后模型诊断的策略,并将其应用于案例说明。我们讨论了一些经验教训,这些经验教训是将监督分类应用于多维数据集,其中之一是特征缩减在实现最佳建模性能方面的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号