首页> 外文学位 >Analysis of machine learning algorithms on bioinformatics data of varying quality.

【24h】

Analysis of machine learning algorithms on bioinformatics data of varying quality.

机译：分析质量不同的生物信息学数据的机器学习算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the main applications of machine learning in bioinformatics is the construction of classification models which can accurately classify new instances using information gained from previous instances. With the help of machine learning algorithms (such as supervised classification and gene selection) new meaningful knowledge can be extracted from bioinformatics datasets that can help in disease diagnosis and prognosis as well as in prescribing the right treatment for a disease. One particular challenge encountered when analyzing bioinformatics datasets is data noise, which refers to incorrect or missing values in datasets. Noise can be introduced as a result of experimental errors (e.g. faulty microarray chips, insufficient resolution, image corruption, and incorrect laboratory procedures), as well as other errors (errors during data processing, transfer, and/or mining). A special type of data noise called class noise, which occurs when an instance/example is mislabeled. Previous research showed that class noise has a detrimental impact on machine learning algorithms (e.g. worsened classification performance and unstable feature selection). In addition to data noise, gene expression datasets can suffer from the problems of high dimensionality (a very large feature space) and class imbalance (unequal distribution of instances between classes). As a result of these inherent problems, constructing accurate classification models becomes more challenging.;To provide guidance to researchers and practitioners in deciding which machine learning algorithms to apply for their analysis, this dissertation performs thorough empirical investigations of machine learning algorithms on bioinformatics data of varying data quality. Comprehensive experiments are performed to assess the robustness of machine learning techniques to class noise. First, we provide a detailed experimental analysis of feature selection techniques as well as classification algorithms in the context of data quality. We then investigate the effectiveness of three forms of ensemble classification techniques when learning from balanced bioinformatics datasets in the context of data quality. We also investigate the importance of alleviating class imbalance for classification problems on bioinformatics datasets. Finally, we address the combined problem of high dimensionality and class imbalance in the context of data quality. vi.

机译：机器学习在生物信息学中的主要应用之一是构建分类模型，该模型可以使用从先前实例获得的信息对新实例进行准确分类。借助机器学习算法（例如监督分类和基因选择），可以从生物信息学数据集中提取新的有意义的知识，这些知识可以帮助疾病诊断和预后以及制定正确的疾病治疗方案。分析生物信息学数据集时遇到的一个特殊挑战是数据噪声，这是指数据集中的值不正确或缺失。可能由于实验错误（例如有缺陷的微阵列芯片，分辨率不足，图像损坏和实验室程序不正确）以及其他错误（数据处理，传输和/或挖掘过程中的错误）而引入噪声。特殊的数据噪声类型称为类噪声，当实例/示例标签错误时会发生。先前的研究表明，类别噪声对机器学习算法有不利影响（例如，恶化的分类性能和不稳定的特征选择）。除了数据噪声外，基因表达数据集还可能遭受高维（很大的特征空间）和类不平衡（类之间实例的不均匀分布）的问题。由于这些固有的问题，建立准确的分类模型变得更具挑战性。为了为研究人员和从业人员提供指导，以决定哪些机器学习算法适用于他们的分析，本论文对机器学习算法的生物信息学数据进行了全面的实证研究。变化的数据质量。进行了全面的实验，以评估机器学习技术对噪声分类的鲁棒性。首先，我们在数据质量的情况下提供了特征选择技术以及分类算法的详细实验分析。然后，我们在数据质量的情况下，从平衡的生物信息学数据集中学习时，研究了三种形式的集成分类技术的有效性。我们还研究了减轻类别不平衡对于生物信息学数据集分类问题的重要性。最后，我们在数据质量的情况下解决了高维和类不平衡的综合问题。 vi。

著录项

作者
Abu Shanab, Ahmad.;
展开▼
作者单位

Florida Atlantic University.;

展开▼
授予单位 Florida Atlantic University.;
学科 Bioinformatics.;Information technology.;Computer science.
学位 Ph.D.
年度 2015
页码 155 p.
总页数 155
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems [J] . Fuchun Sun, Guang-Bin Huang, Q. M. Jonathan Wu, IEEE Transactions on Systems, Man, and Cybernetics . 2017,第10期

机译：大数据和动态变化系统的高效快速机器学习算法
2. Discovery of small molecule binders of human FSHR(TMD) with novel structural scaffolds by integrating structural bioinformatics and machine learning algorithms [J] . Sahu Bhawana, Shah Sanchi, Prabhudesai Kaushiki, Journal of molecular graphics & modelling . 2019,第期

机译：通过整合结构生物信息学和机器学习算法，在新型结构支架中发现人FSHR（TMD）的小分子粘合剂
3. Discovery of small molecule binders of human FSHR(TMD) with novel structural scaffolds by integrating structural bioinformatics and machine learning algorithms [J] . Sahu Bhawana, Shah Sanchi, Prabhudesai Kaushiki, Journal of molecular graphics & modelling . 2019,第期

机译：通过整合结构生物信息学和机器学习算法，在新型结构支架中发现人FSHR（TMD）的小分子粘合剂
4. Comparative Analysis of Machine Learning Classifiers on Bioinformatics and Clinical Datasets [C] . Falguni Ranadive, Akil Surti, Priyanka Sharma International Conference on Computing for Sustainable Global Development . 2019

机译：机器学习分类器对生物信息学和临床数据集的比较分析
5. Machine learning in bioinformatics: Algorithms, implementations and applications [D] . Langlois, Robert E. 2008

机译：生物信息学中的机器学习：算法，实现和应用
6. Formal Medical Knowledge Representation Supports Deep Learning Algorithms Bioinformatics Pipelines Genomics Data Analysis and Big Data Processes [O] . Ferdinand Dhombres, Jean Charlet 2019

机译：正式的医学知识表示支持深度学习算法生物信息学管道基因组学数据分析和大数据过程
7. The Bioinformatics Bookshelf: Teach Yourself Computational Biology? Bioinformatics: The Machine Learning Approach By Pierre Baldi and Soren Brunak Cambridge, MA: MIT Press (1998). 351 pp. $40.00; Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins Edited by Andreas D. Baxevanis and B. F. Francis Ouellette New York: Wiley-lnterscience (1998). 370 pp. $59.95; Guide to Human Genome Computing, Second Edition Edited by Martin J. Bishop San Diego, CA: Academic Press (1998). 306 pp. $69.95; Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids By Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison Cambridge: Cambridge University Press (1998). 356 pp. $34.95; Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology By Dan Gusfield Cambridge: Cambridge University Press (1997). 534 pp. $59.95; Introduction to Computational Molecular Biology By Joao Setubal and Joao Meidanis Boston: PWS Publishing (1997). 296 pp. $61.95 [O] . Pickeral Oxana K, Boguski Mark S 1999

机译：生物信息学书架：自学计算生物学吗？生物信息学：机器学习方法，作者：Pierre Baldi和Soren Brunak剑桥，麻省：麻省理工学院出版社（1998）。 351页，$ 40.00；生物信息学：由Andreas D. Baxevanis和B. F. Francis Ouellette编辑的基因和蛋白质分析实用指南纽约：Wiley-Interscience（1998）。 370页，$ 59.95；《人类基因组计算指南》，第二版，由马丁·J·毕晓普（Martin J. Bishop）编辑，加利福尼亚州圣地亚哥：学术出版社（1998）。 306页，$ 69.95；生物序列分析：蛋白质和核酸的概率模型Richard Durbin，Sean Eddy，Anders Krogh和Graeme Mitchison剑桥：剑桥大学出版社（1998年）。 356页，$ 34.95；字符串，树和序列上的算法：计算机科学和计算生物学Dan Danssfield剑桥：剑桥大学出版社（1997年）。 534页，$ 59.95； Joao Setubal和Joao Meidanis Boston撰写的《计算分子生物学概论》：PWS出版（1997）。 296羽61.95美元

Analysis of machine learning algorithms on bioinformatics data of varying quality.

摘要

著录项

相似文献

相关主题

期刊订阅