首页> 外文学位 >Boosting algorithms for mining biomedical and biological data.
【24h】

Boosting algorithms for mining biomedical and biological data.

机译:促进生物医学和生物数据挖掘的算法。

获取原文
获取原文并翻译 | 示例

摘要

Biomedical informatics is an interdisciplinary field that uses information technologies to analyze and understand biological and biomedical data to improve the detection, prevention, and treatment of disease. Data obtained from these applications contain valuable information that awaits advanced computational techniques for extraction and analysis. Machine learning and data mining techniques have proven to be excellent tools to extract the knowledge that is encapsulated in the form of various patterns in the data. Boosting is an adaptive supervised machine learning algorithm that have been successfully applied to different applications. It is a robust method that generates multiple classifiers from a base learner and ensembles them for building the best classifier. The base learner can be any weak learning algorithm which is already optimized for accuracy and boosting can still improve the accuracy. This Thesis focuses on applying boosting algorithms on biomedical informatics data for the classification task and compare its performance against the other traditional machine learning algorithms. Two critical data mining problems that are investigated in this Thesis are : early detection of breast cancer (which is critical for saving the lives of the cancer patients) and prediction of 3D structure of the protein (which is useful for functional classification). A best model for early cancer detection is created to achieve a higher AUC in a clinically relevant region. Protein structure prediction is done with both flat classification and hierarchical classification approaches. In both the approaches, boosting achieved better accuracy than the other successful algorithms in the literature. Boosting not only yields improved accuracy, but is also very efficient.
机译:生物医学信息学是一个跨学科领域,它使用信息技术来分析和理解生物和生物医学数据,以改善疾病的检测,预防和治疗。从这些应用程序获得的数据包含有价值的信息,这些信息正在等待用于提取和分析的高级计算技术。事实证明,机器学习和数据挖掘技术是提取以各种模式形式封装在数据中的知识的优秀工具。 Boosting是一种自适应监督机器学习算法,已成功应用于不同的应用程序。这是一种可靠的方法,可从基础学习者生成多个分类器,并将它们组合在一起以构建最佳分类器。基本学习器可以是已经针对精度进行了优化的任何弱学习算法,增强学习仍可以提高精度。本论文着重于对分类任务的生物医学信息数据应用boost算法,并将其性能与其他传统机器学习算法进行比较。本论文研究的两个关键数据挖掘问题是:乳腺癌的早期检测(这对于挽救癌症患者的生命至关重要)和蛋白质的3D结构预测(对功能分类很有用)。创建了用于早期癌症检测的最佳模型,以在临床相关区域获得更高的AUC。蛋白质结构预测可通过平面分类和层次分类方法完成。在这两种方法中,与文献中的其他成功算法相比,提升算法都具有更好的精度。提高效率不仅可以提高准确性,而且非常有效。

著录项

  • 作者

    Krishnaraj, Yazhene.;

  • 作者单位

    Wayne State University.;

  • 授予单位 Wayne State University.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 M.S.
  • 年度 2009
  • 页码 81 p.
  • 总页数 81
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号