首页> 外文期刊>Scientific programming >Mining low-variance biclusters to discover coregulation modules in sequencing datasets
【24h】

Mining low-variance biclusters to discover coregulation modules in sequencing datasets

机译:挖掘低方差二元群以发现测序数据集中的共调模块

获取原文
获取原文并翻译 | 示例

摘要

High-throughput sequencing (CHIP-Seq) data exhibit binding events with possible binding locations and their strengths, followed by interpretation of the locations of peaks. Recent methods tend to summarize all CHIP-Seq peaks detected within a limited up and down region of each gene into one real-valued score in order to quantify the probability of regulation in a region. Applying subspace clustering techniques on these scores can help discover important knowledge such as the potential co-regulation or co-factor mechanisms. The ideal biclusters generated would contain subsets of genes and transcription factors (TF) such that the cell-values in biclusters are distributed around a mean value with very low variance. Such biclusters would indicate TF sets regulating gene sets with very similar probability values. However, most existing biclustering algorithms neither enforce low variance as the desired property of a bicluster, nor use variance as a guiding metric while searching for the desirable biclusters. In this paper we present an algorithm that searches a space of all overlapping biclusters organized in a lattice, and uses an upper bound on variance values of biclusters as the guiding metric. We show the algorithm to be an efficient and effective method for discovering the possibly overlapping biclusters under pre-defined variance bounds. We present in this paper our algorithm, its results with synthetic, CHIP-Seq and motif datasets, and compare them with the results obtained by other algorithms to demonstrate the power and effectiveness of our algorithm.
机译:高通量测序(CHIP-Seq)数据显示结合事件以及可能的结合位置及其强度,然后解释峰的位置。最近的方法倾向于将在每个基因的有限的上下区域内检测到的所有CHIP-Seq峰汇总为一个实数值,以量化区域中调控的可能性。在这些分数上应用子空间聚类技术可以帮助发现重要的知识,例如潜在的共调节或共因子机制。生成的理想双峰将包含基因和转录因子(TF)的子集,以使双峰中的细胞值分布在均值附近,且方差很小。这样的二元组将以非常相似的概率值指示TF集调节基因集。但是,大多数现有的双簇算法既不要求将低方差作为双簇的所需属性,也不在搜索所需双簇时将方差用作指导度量。在本文中,我们提出了一种算法,该算法搜索在一个格子中组织的所有重叠二元组的空间,并使用二元组方差值的上限作为指导度量。我们表明该算法是一种有效的方法,用于发现预定义方差边界下可能重叠的双聚类。我们在本文中介绍了我们的算法,并将其结果与合成数据,CHIP-Seq和主题数据集进行了比较,并将它们与其他算法获得的结果进行比较,以证明我们算法的功能和有效性。

著录项

  • 来源
    《Scientific programming》 |2012年第1期|p.15-27|共13页
  • 作者

    Zhen Hu; Raj Bhatnagar;

  • 作者单位

    School of Computing Sciences and Informatics, University of Cincinnati, Cincinnati, OH, USA;

    School of Computing Sciences and Informatics, University of Cincinnati, Cincinnati, OH, USA;

  • 收录信息 美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    classification; clustering;

    机译:分类;聚类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号