...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets
【24h】

Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets

机译:频繁模式的期望最大化,一种特定的,局部的,基于模式的生物数据集算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Currently, binary biclustering algorithms are too slow and non-specific to handle biological datasets that have a large number of attributes, which is essential for the computational biology problem of microarray analysis. Specialized computers may be needed to execute an algorithm, and may fail to produce a solution, due to its large resource needs. The biclusters also include too many false positives, the type I error, which hinders biological discovery. We propose an algorithm that can analyze datasets with a large attribute set at different densities, and can operate on a laptop, which makes it accessible to practitioners. EMFP produces biclusters that have a very low Root Mean Squared Error and false positive rate, with very few type II errors. Our binary biclustering algorithm is a hybrid, axis-parallel, pattern-based algorithm that finds multiple, non-overlapping, near-constant, deterministic, binary submatricies, with a variable confidence threshold, and the novel use of local density comparisons versus the standard global threshold. EMFP introduces a new, and intuitive way to calculate internal measures for binary biclustering methods. We also introduce a framework to ease comparison with other algorithms, and compare to both binary and general biclustering algorithms using two real, and 80 synthetic databases.
机译:当前,二进制二类聚类算法太慢且非特定性,无法处理具有大量属性的生物数据集,这对于微阵列分析的计算生物学问题至关重要。可能需要专用计算机来执行算法,并且由于其巨大的资源需求,可能无法产生解决方案。两面体还包括太多的假阳性(I型错误),这会阻碍生物学发现。我们提出了一种算法,该算法可以分析具有不同密度的大属性集的数据集,并且可以在便携式计算机上运行,​​从而使从业人员可以访问它。 EMFP产生的双簇具有极低的均方根误差和误报率,而II型误差却很少。我们的二进制双簇算法是一种混合的,基于轴平行的,基于模式的算法,可找到具有可变置信度阈值的多个,不重叠,近恒定,确定性的二进制子矩阵,以及新颖地使用了局部密度比较和标准全局阈值。 EMFP引入了一种新的直观方法来计算二进制二类聚类方法的内部度量。我们还介绍了一个框架,以简化与其他算法的比较,并使用两个真实的80个合成数据库与二进制和通用双聚类算法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号