首页> 外文期刊>Parallel Algorithms and Applications >A novel PSO-based graph-theoretic approach for identifying most relevant and non-redundant gene markers from gene expression data
【24h】

A novel PSO-based graph-theoretic approach for identifying most relevant and non-redundant gene markers from gene expression data

机译:一种新颖的基于PSO的图论方法,可从基因表达数据中识别最相关和非冗余的基因标记

获取原文
获取原文并翻译 | 示例

摘要

Cancer is an extremely complex, heterogeneous and mutated genetic disease. Many researchers in molecular genetics have predicted a number of key genes which probably contribute to oncogenesis and potential drug targets for different types of cancer. However, this is still an ongoing process. In this article, not only the gene relevance is considered, but also the redundancy among genes is taken care of. For identifying the non-redundant gene markers from microarray gene expression data, a graph-theoretic approach has been presented. The sample versus gene data presented by microarray data are first converted into a weighted undirected complete feature-graph where the nodes represent the genes having gene's relevance as node weights and the edges are weighted according to the similarity value (correlation) among the genes. Then, the densest subgraph having minimum average edge weight (similarity) and maximum average node weight (relevance) is identified from the original feature-graph. To find the densest subgraph, binary particle swarm optimisation has been applied for minimising the average edge weight and maximising the average node weight through a single objective function. Thus, an optimised reduced subgraph is found which contains a set of selected genes for which average correlation is very less and average gene relevance is very high. The proposed method is compared with sequential forward search, T-test, Rank-sum test, minimum redundancy maximum relevance scheme, correlation-based feature selection, sequential backward elimination and fast correlation-based filter solutions in terms of sensitivity, specificity, accuracy, F-score, area under the receiver operating characteristic curve, average correlation and stability on several real-life data-sets.
机译:癌症是一种极为复杂,异质性和变异的遗传疾病。分子遗传学的许多研究人员已经预测了许多关键基因,这些关键基因可能有助于不同类型癌症的致癌作用和潜在的药物靶标。但是,这仍然是一个持续的过程。在本文中,不仅考虑了基因相关性,而且考虑了基因之间的冗余性。为了从微阵列基因表达数据中鉴定非冗余基因标记,提出了一种图论方法。首先将微阵列数据提供的样本对基因数据转换为加权无向完整特征图,其中节点代表具有基因相关性的基因作为节点权重,边缘根据基因之间的相似度值(相关性)加权。然后,从原始特征图中识别出具有最小平均边缘权重(相似度)和最大平均节点权重(相关性)的最密集子图。为了找到最密集的子图,已应用二进制粒子群算法通过单个目标函数来最小化平均边缘权重并最大化平均节点权重。因此,找到了优化的简化子图,该图包含一组选定的基因,其平均相关性非常低,而平均基因相关性非常高。将该方法与顺序向前搜索,T检验,秩和检验,最小冗余最大相关方案,基于相关性的特征选择,顺序向后消除和基于快速相关性的过滤器解决方案的敏感性,特异性,准确性, F分数,接收器工作特性曲线下的面积,几个实际数据集的平均相关性和稳定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号