首页> 外文期刊>Genes & Genetic Systems >CG-containing oligonucleotides and transcription factor-binding motifs are enriched in human pericentric regions
【24h】

CG-containing oligonucleotides and transcription factor-binding motifs are enriched in human pericentric regions

机译:含CG的寡核苷酸和转录因子结合基序在人的外周区域富集

获取原文
获取原文并翻译 | 示例
       

摘要

Unsupervised data mining capable of extracting a wide range of information from big sequence data without prior knowledge or particular models is highly desirable in an era of big data accumulation for research on genes, genomes and genetic systems. By handling oligonucleotide compositions in genomic sequences as high-dimensional data, we have previously modified the conventional SOM (self-organizing map) for genome informatics and established BLSOM for oligonucleotide composition, which can analyze more than ten million sequences simultaneously and is thus suitable for big data analyses. Oligonucleotides often represent motif sequences responsible for sequence-specific binding of proteins such as transcription factors. The distribution of such functionally important oligonucleotides is probably biased in genomic sequences, and may differ among genomic regions. When constructing BLSOMs to analyze pentanucleotide composition in 50-kb sequences derived from the human genome in this study, we found that BLSOMs did not classify human sequences according to chromosome but revealed several specific zones, which are enriched for a class of CG-containing pentanucleotides; these zones are composed primarily of sequences derived from pericentric regions. The biological significance of enrichment of these pentanucletides in pericentric regions is discussed in connection with cell type- and stage-dependent formation of the condensed heterochromatin in the chromocenter, which is formed through association of pericentric regions of multiple chromosomes.
机译:在大数据积累的时代,用于基因,基因组和遗传系统研究的无监督数据挖掘技术能够在无需先验知识或特定模型的情况下从大序列数据中提取大量信息,这是非常需要的。通过将基因组序列中的寡核苷酸组成作为高维数据进行处理,我们先前已经修改了用于基因组信息学的常规SOM(自组织图),并建立了用于寡核苷酸组成的BLSOM,它可以同时分析超过一千万个序列,因此适合大数据分析。寡核苷酸通常代表负责蛋白质例如转录因子的序列特异性结合的基序序列。这种功能上重要的寡核苷酸的分布可能在基因组序列中有偏差,并且在基因组区域之间可能有所不同。在这项研究中,当构建BLSOM来分析源自人类基因组的50 kb序列中的五核苷酸组成时,我们发现BLSOM并未根据染色体对人类序列进行分类,而是揭示了几个特定区域,这些区域富含一类含CG的五核苷酸;这些区域主要由来自周边区域的序列组成。围绕中心区域中浓缩的异染色质的细胞类型和阶段依赖性形成,讨论了这些五聚核苷酸在中心区域富集的生物学意义,该过程是通过多个染色体的中心区域的结合而形成的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号