首页> 外文学位 >Analysis of machine learning algorithms on bioinformatics data of varying quality.
【24h】

Analysis of machine learning algorithms on bioinformatics data of varying quality.

机译:分析质量不同的生物信息学数据的机器学习算法。

获取原文
获取原文并翻译 | 示例

摘要

One of the main applications of machine learning in bioinformatics is the construction of classification models which can accurately classify new instances using information gained from previous instances. With the help of machine learning algorithms (such as supervised classification and gene selection) new meaningful knowledge can be extracted from bioinformatics datasets that can help in disease diagnosis and prognosis as well as in prescribing the right treatment for a disease. One particular challenge encountered when analyzing bioinformatics datasets is data noise, which refers to incorrect or missing values in datasets. Noise can be introduced as a result of experimental errors (e.g. faulty microarray chips, insufficient resolution, image corruption, and incorrect laboratory procedures), as well as other errors (errors during data processing, transfer, and/or mining). A special type of data noise called class noise, which occurs when an instance/example is mislabeled. Previous research showed that class noise has a detrimental impact on machine learning algorithms (e.g. worsened classification performance and unstable feature selection). In addition to data noise, gene expression datasets can suffer from the problems of high dimensionality (a very large feature space) and class imbalance (unequal distribution of instances between classes). As a result of these inherent problems, constructing accurate classification models becomes more challenging.;To provide guidance to researchers and practitioners in deciding which machine learning algorithms to apply for their analysis, this dissertation performs thorough empirical investigations of machine learning algorithms on bioinformatics data of varying data quality. Comprehensive experiments are performed to assess the robustness of machine learning techniques to class noise. First, we provide a detailed experimental analysis of feature selection techniques as well as classification algorithms in the context of data quality. We then investigate the effectiveness of three forms of ensemble classification techniques when learning from balanced bioinformatics datasets in the context of data quality. We also investigate the importance of alleviating class imbalance for classification problems on bioinformatics datasets. Finally, we address the combined problem of high dimensionality and class imbalance in the context of data quality. vi.
机译:机器学习在生物信息学中的主要应用之一是构建分类模型,该模型可以使用从先前实例获得的信息对新实例进行准确分类。借助机器学习算法(例如监督分类和基因选择),可以从生物信息学数据集中提取新的有意义的知识,这些知识可以帮助疾病诊断和预后以及制定正确的疾病治疗方案。分析生物信息学数据集时遇到的一个特殊挑战是数据噪声,这是指数据集中的值不正确或缺失。可能由于实验错误(例如有缺陷的微阵列芯片,分辨率不足,图像损坏和实验室程序不正确)以及其他错误(数据处理,传输和/或挖掘过程中的错误)而引入噪声。特殊的数据噪声类型称为类噪声,当实例/示例标签错误时会发生。先前的研究表明,类别噪声对机器学习算法有不利影响(例如,恶化的分类性能和不稳定的特征选择)。除了数据噪声外,基因表达数据集还可能遭受高维(很大的特征空间)和类不平衡(类之间实例的不均匀分布)的问题。由于这些固有的问题,建立准确的分类模型变得更具挑战性。为了为研究人员和从业人员提供指导,以决定哪些机器学习算法适用于他们的分析,本论文对机器学习算法的生物信息学数据进行了全面的实证研究。变化的数据质量。进行了全面的实验,以评估机器学习技术对噪声分类的鲁棒性。首先,我们在数据质量的情况下提供了特征选择技术以及分类算法的详细实验分析。然后,我们在数据质量的情况下,从平衡的生物信息学数据集中学习时,研究了三种形式的集成分类技术的有效性。我们还研究了减轻类别不平衡对于生物信息学数据集分类问题的重要性。最后,我们在数据质量的情况下解决了高维和类不平衡的综合问题。 vi。

著录项

  • 作者

    Abu Shanab, Ahmad.;

  • 作者单位

    Florida Atlantic University.;

  • 授予单位 Florida Atlantic University.;
  • 学科 Bioinformatics.;Information technology.;Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 155 p.
  • 总页数 155
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号