首页> 外文学位 >Integrated feature subset selection/extraction with applications in bioinformatics.

【24h】

Integrated feature subset selection/extraction with applications in bioinformatics.

机译：集成的功能子集选择/提取以及生物信息学中的应用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature subset selection and extraction algorithms are actively and extensively studied in machine learning literature to reduce the dimensionality of feature space, since high dimensional data sets are generally not efficiently and effectively handled by a large array of machine learning and pattern recognition algorithms. When we stride into the analysis of large scale bioinformatics data sets, such as microarray gene expression data sets, the high dimensionality of feature space compounded with the low dimensionality of sample space, creates even more problems for data analysis algorithms.;Two foremost characteristics of microarray gene expression data sets are: (1) the correlation between features (genes) and (2) the availability of domain knowledge in computable format. In this dissertation, we will study effective feature selection and extraction algorithms with applications to the analysis of the new emerging data sets in the bioinformatics domain. Microarray gene expression data set, the result of large scale RNA profiling techniques, is our primary focus in this thesis. Several novel feature (gene) selection and extraction algorithms are proposed to deal with peculiarities on microarray gene expression data set.;To address the first characteristic of the microarray gene expression data set, we first propose a general feature selection algorithm called Boost Feature Subset Selection (BFSS) based on permutation analysis to broaden the scope of selected gene set and thus improve classification performance. In BFSS, subsequent features to be selected focus on those samples where previously selected features fail. Our experiments showed the benefit of BFSS for t-score and S2N (signal to noise) based single gene scores on a variety of publicly available microarray gene expression data sets.;We then examine the correlations among features (genes) explicitly to see if such correlations are informative for the purpose of sample classification. This results in our gene extraction algorithm called virtual gene. A virtual gene is a group of genes whose expression levels are combined linearly. The combined expression levels of a virtual gene instead of the real gene expression levels are used for sample classification. Our experiments confirm that by taking into consideration the correlations between gene pairs, we could indeed build a better sample classifier.;Microarray gene expression data set only represents one aspect of our knowledge of the underlying biological system. Currently there are lots of biological knowledge in computable format that can be accessed from Internet. Continue to address the second characteristic of the microarray gene expression data set, we investigate the integration of domain knowledge, such as those imbedded in gene ontology annotations, for the use of gene selection and extraction. (Abstract shortened by UMI.).

机译：为了减少特征空间的维数，在机器学习文献中对特征子集的选择和提取算法进行了积极而广泛的研究，因为高维数据集通常无法由一大堆机器学习和模式识别算法有效地处理。当我们深入分析大规模生物信息学数据集（例如微阵列基因表达数据集）时，特征空间的高维与样本空间的低维相结合，给数据分析算法带来了更多的问题。微阵列基因表达数据集是：（1）特征（基因）之间的相关性，以及（2）可计算格式的领域知识的可用性。在本文中，我们将研究有效的特征选择和提取算法，并将其应用于生物信息学领域新兴数据集的分析。微阵列基因表达数据集是大规模RNA分析技术的结果，是我们的主要研究重点。针对微阵列基因表达数据集的特殊性，提出了几种新颖的特征（基因）选择和提取算法。为了解决微阵列基因表达数据集的第一个特征，我们首先提出了一种通用的特征选择算法Boost Feature Subset Selection。（BFSS）基于置换分析，以拓宽所选基因集的范围，从而提高分类性能。在BFSS中，要选择的后续功能集中于先前选择的功能失败的那些样本。我们的实验显示了BFSS在各种公开可用的微阵列基因表达数据集上基于t分数和基于S2N（信噪比）的单基因评分的优势;然后我们明确检查特征（基因）之间的相关性，以查看是否相关性有助于样本分类。这导致我们的基因提取算法称为虚拟基因。虚拟基因是一组表达水平线性组合的基因。虚拟基因的组合表达水平而非真实基因的表达水平用于样品分类。我们的实验证实，通过考虑基因对之间的相关性，我们确实可以构建更好的样本分类器。微阵列基因表达数据集仅代表我们对基础生物学系统的了解的一个方面。当前，有许多可计算格式的生物学知识可以从Internet访问。继续解决微阵列基因表达数据集的第二个特征，我们研究了领域知识的整合，例如嵌入基因本体注释中的知识，以供基因选择和提取。（摘要由UMI缩短。）。

著录项

作者
Xu, Xian.;
展开▼
作者单位

State University of New York at Buffalo.;

展开▼
授予单位 State University of New York at Buffalo.;
学科 Computer Science.;Biology Bioinformatics.
学位 Ph.D.
年度 2006
页码 209 p.
总页数 209
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Evaluation of Feature Subset Selection, Feature Weighting, and Prototype Selection for Biomedical Applications [J] . Suzanne LITTLE, Sara COLANTONIO, Ovidio SALVETTI, Journal of Software Engineering and Applications . 2010,第1期

机译：生物医学应用的特征子集选择，特征权重和原型选择的评估
2. Relevant Feature Subset Selection from Ensemble of Multiple Feature Extraction Methods for Texture Classification [J] . Bharti Rana, Akanksha Juneja, Ramesh Kumar Agrawal International journal of computer vision and iImage processing . 2015,第1期

机译：从多种特征提取方法的集合中选择相关的特征子集进行纹理分类
3. QSPR models for half-wave reduction potential of steroids:A comparative study between feature selection and feature extraction from subsets of or entire set of descriptors [J] . Bahram Hemmateenejad, Mahdieh Yazdani Analytica chimica acta . 2009,第1期

机译：类固醇半波降低潜力的QSPR模型：从描述符的子集或整个描述符集进行特征选择和特征提取之间的比较研究
4. Evaluation of Feature Subset Selection, Feature Weighting, and Prototype Selection for Biomedical Applications [C] . Suzanne Little, Ovidio Salvetti, Petra Perner International Conference Advances in Case-Based Reasoning . 2008

机译：对生物医学应用的特征子集选择，具有特征加权和原型选择的评估
5. Feature Selection Via Random Subsets of Uncorrelated Features [D] . Long, Dang Kim. 2020

机译：通过无相关的功能的随机子集选择功能选择
6. An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach [O] . Zhila Esna Ashari, Nairanjana Dasgupta, Kelly A. Brayton, 2012

机译：基于多级特征选择方法预测一组子集的IV型分泌系统效应蛋白的一组最佳特征
7. Evaluation of Feature Subset Selection, Feature Weighting, and Prototype Selection for Biomedical Applications [O] . Suzanne LITTLE, Sara COLANTONIO, Ovidio SALVETTI, 2010

机译：生物医学应用的特征子集选择，特征权重和原型选择的评估

Integrated feature subset selection/extraction with applications in bioinformatics.

摘要

著录项

相似文献

相关主题

期刊订阅