首页> 外文期刊>Applied mathematics and computation >Parallel extraction of association rules from genomics data
【24h】

Parallel extraction of association rules from genomics data

机译:基因组学数据并行提取关联规则

获取原文
获取原文并翻译 | 示例
           

摘要

High-throughput experimental platforms like microarrays produce massive amounts of omics data for each analyzed sample. As an example, the Affymetrix DMET (Drug Metabolizing Enzymes and Transporters) microarray platform can discover Single Nucleotide Polymorphisms (SNPs) from 225 human genes involved in absorption, distribution, metabolism, and excretion (ADME) of drugs, enabling large pharmacogenomics studies. Moreover, the application of such platforms to large populations of subjects is further increasing the size of experimental datasets produced in clinical studies. Thus, the production of big omics datasets is a first reason to use parallel computing in bioinformatics. Such omics datasets are usually analyzed with classical statistical analysis and, more recently, by using data mining methods that can extract knowledge hidden in the data, e.g. by highlighting multiple associations among features of the data. However, the use of standard off-the-shelf data mining algorithms to large omic datasets, especially when considering association rule mining, poses two main issues: (i) huge requests of central memory that may prevent the execution of data mining software on personal/desktop computers; and (ii) very long response time, that may increase the time requested for completing extensive pharmacogenomics studies. To overcome the limits of standard association rule mining algorithms when applied to omics datasets, we propose PARES (Parallel Association Rules Extractor from SNPs), a novel parallel algorithm for the efficient extraction of association rules from omics datasets. PARES is implemented as a multi-thread version of an optimized version of the Frequent Pattern Growth (FP-Growth) algorithm. Moreover, it includes a customized SNPs datasets preprocessing strategy based on a Fisher's Test Filter to discard the trivial transactions from the input dataset, reducing the search space from which to build many independent FP-Trees. The experimental results show tha
机译:像微阵列这样的高通量实验平台为每个分析的样品产生大量的常常数据。作为一个例子,染色蛋白酶(药物代谢酶和转运蛋白)微阵列平台可以从参与吸收,分布,代谢和排泄(ADME)的225名人类基因中发现单核苷酸多态性(SNP),从而实现了大型药物研究学研究。此外,将这种平台应用于大量受试者的受试者的应用是进一步增加了临床研究中产生的实验数据集的尺寸。因此,大OMICS数据集的生产是在生物信息学中使用并行计算的第一个原因。这些OMIC数据集通常通过经典统计分析分析,并且最近,通过使用可以提取隐藏在数据中的知识的数据挖掘方法,例如,可以使用数据挖掘方法。通过突出显示数据的特征之间的多个关联。但是,使用标准的离心数据挖掘算法到大型OMIC数据集,尤其是考虑关联规则挖掘时,构成了两个主要问题:(i)巨大的中央内存请求,可以防止在个人上执行数据挖掘软件/桌面电脑; (ii)非常长的响应时间,这可能会增加所要求的时间,以完成广泛的药学研究。为了克服标准关联规则挖掘算法的限制,当应用于OMIC数据集时,我们提出了PARES(来自SNP的并行关联规则提取器),这是一种用于从OMIC数据集的有效提取关联规则的新的并行算法。 PARES实现为频繁模式增长(FP-Grower)算法的优化版本的多线程版本。此外,它包括基于Fisher的测试过滤器的自定义SNP数据集预处理策略,以丢弃从输入数据集中丢弃琐碎的事务,从而减少了从中构建许多独立FP树的搜索空间。实验结果显示

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号