首页> 外文会议>IEEE International Conference on Bioinformatics and Bioengineering >Frequent weighted itemset mining from gene expression data
【24h】

Frequent weighted itemset mining from gene expression data

机译:频繁加权替换项目从基因表达数据开采

获取原文

摘要

Gene Expression Datasets (GEDs) usually consist of the expression values of thousands of genes within hundreds of samples. Frequent itemset and association rule mining algorithms have been applied to discover significant co-expressions among multiple genes from GEDs. To perform these data analyses, gene expression values are commonly discretized into a predefined number of bins. Such an expert-driven and not trivial preprocessing step could bias the quality of the mining result. This paper presents a novel approach to discovering gene correlations from GEDs which does not require data discretization. By representing per-sample gene expression values as item weights, frequent weighted itemsets can be extracted. The discovery of weighted itemsets instead of traditional (not weighted) ones prevents experts from discretizing GEDs before analyzing them and thus improves the effectiveness of the knowledge discovery process. Experiments performed on real GEDs demonstrate the effectiveness of the proposed approach.
机译:基因表达数据集(GED)通常由数百个样品中数千个基因的表达值组成。频繁的项目集和关联规则挖掘算法已应用于发现来自GED的多种基因之间的显着联合表达。为了执行这些数据分析,基因表达值通常被离散地分成预定数量的垃圾箱。这样的专家驱动和不琐碎的预处理步骤可以偏离挖掘结果的质量。本文介绍了一种从未要求数据离散化的GED的基因相关性的新方法。通过将每个样本基因表达值表示为项目权重,可以提取频繁加权项集。对加权项目集的发现而不是传统(未加权)的项目,防止专家在分析之前离散,从而提高了知识发现过程的有效性。实验对实际GED进行的实验表明了所提出的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号