首页> 外文会议>IEEE International Conference on Information Reuse and Integration >An Empirical Study of Supervised Learning for Biological Sequence Profiling and Microarray Expression Data Analysis
【24h】

An Empirical Study of Supervised Learning for Biological Sequence Profiling and Microarray Expression Data Analysis

机译:生物序列分析和微阵列表达数据分析监督学习的实证研究

获取原文

摘要

Recent years have seen increasing quantities of high-throughput biological data available for genetic disease profiling, protein structure and function prediction, and new drug and therapy discovery. High-throughput biological experiments output high volume and/or high dimensional data, which impose significant challenges for molecular biologists and domain experts to properly and rapidly digest and interpret the data. In this paper, we provide simple background knowledge for computer scientists to understand how supervised learning tools can be used to solve biological challenges, with a primary focus on two types of problems: Biological sequence profiling and microarray expression data analysis. We employ a set of supervised learning methods to analyze four types of biological data: (1) gene promoter site prediction; (2) splice junction prediction; (3) protein structure prediction; and (4) gene expression data analysis. We argue that although existing studies favor one or two learning methods (such as Support Vector Machines), such conclusions might have been biased, mainly because of the inadequacy of the measures employed in their study. A line of learning algorithms should be considered in different scenarios, depending on the objective and the requirement of the applications, such as the system running time or the prediction accuracy on the minority class examples.
机译:近年来已经看到越来越多的高通量生物数据可用于遗传疾病分析,蛋白质结构和功能预测,以及新药和治疗发现。高通量生物实验产出高容量和/或高维数据,这对分子生物学家和领域专家施加了重大挑战,以适当且快速地消化和解释数据。在本文中,我们为计算机科学家提供了简单的背景知识,了解如何用于解决生物挑战的监督学习工具,主要关注两种问题:生物序列分析和微阵列表达数据分析。我们采用一套监督学习方法来分析四种类型的生物数据:(1)基因启动子位点预测; (2)接头结预测; (3)蛋白质结构预测; (4)基因表达数据分析。我们认为,尽管现有研究有利于一两种学习方法(如支持向量机),但这些结论可能已被偏见,主要是因为他们研究中采用措施的不足。应在不同方案中考虑一系列学习算法,具体取决于应用程序的目标和要求,例如系统运行时间或少数类示例上的预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号