首页> 外文会议>ACM Annual Symposium on Applied Computing >Clustering of Diverse Genomic Data using Information Fusion
【24h】

Clustering of Diverse Genomic Data using Information Fusion

机译:使用信息融合聚类不同的基因组数据

获取原文

摘要

Genome sequencing projects and high-throughput technologies like DNA and Protein arrays have resulted in a very large amount of information-rich data. Microarray experimental data are a valuable, but limited source for inferring gene regulation mechanisms on a genomic scale. Additional information such as promoter sequences of genes/ DNA binding motifs, gene ontologies, and location data, when combined with gene expression analysis can increase the statistical significance of the finding. This paper introduces a machine learning approach to information fusion for combining heterogeneous genomic data. This algorithm uses an unsupervised joint learning mechanism that identifies clusters of genes using the combined data. The correlation between gene expression time-series patterns obtained from different experimental conditions and the presence of several distinct and repeated motifs in their upstream sequences is examined here using publicly available yeast cell-cycle data. The results show that the combined learning approach taken here identifies correlated genes effectively. The algorithm provides an automated clustering method, but allows the user to specify apriori the influence of each data type on the final clustering using probabilities.
机译:基因组测序项目和DNA和蛋白阵列等高通量技术导致了很多信息丰富的数据。微阵列实验数据是一种有价值但有限的源,用于推断基因组规模的基因调节机制。当与基因表达分析结合时,基因/ DNA结合基序的促进剂序列,基因本体和位置数据等附加信息可以提高发现的统计学意义。本文介绍了一种用于组合异构基因组数据的信息融合的机器学习方法。该算法使用无监督的联合学习机制,其使用组合数据识别基因集群。从不同的实验条件获得的基因表达时间序列模式与在其上游序列中的几种不同和重复基序的存在的相关性在这里使用公共酵母细胞周期数据检查其上游序列中的存在。结果表明,这里采取的组合学习方法有效地识别相关基因。该算法提供了一种自动聚类方法,但允许用户指定APRIORI使用概率对最终聚类对最终聚类的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号