首页> 外文会议>Workshop on Genome Informatics >A novel bioinformatic strategy for unveiling hidden genome signatures of eukaryotes: self-organizing map of oligonucleotide frequency.
【24h】

A novel bioinformatic strategy for unveiling hidden genome signatures of eukaryotes: self-organizing map of oligonucleotide frequency.

机译:一种新型生物信息策略,用于揭开真核生物的隐性基因组特征:寡核苷酸频率的自组织地图。

获取原文

摘要

With the increasing amount of available genome sequences, novel tools are needed for comprehensive analysis of species-specific sequence characteristics for a wide variety of genomes. We used an unsupervised neural network algorithm, Kohonen's self-organizing map (SOM), to analyze di- and trinucleotide frequencies in 9 eukaryotic genomes of known sequences (a total of 1.2 Gb); S. cerevisiae, S. pombe, C. elegans, A. thaliana, D. melanogaster, Fugu, and rice, as well as P. falciparum chromosomes 2 and 3, and human chromosomes 14, 20, 21, and 22, that have been almost completely sequenced. Each genomic sequence with different window sizes was encoded as a 16- and 64-dimensional vector giving relative frequencies of di- and trinucleotides, respectively. From analysis of a total of 120,000 nonoverlapping 10-kb sequences and overlapping 100-kb sequences with a moving step size of 10 kb, derived from a total of the 1.2 Gb genomic sequences, clear species-specific separations of most sequences were obtained with the SOMs. The unsupervised algorithm could recognize, in most of the 120,000 10-kb sequences, the species-specific characteristics (key combinations of oligonucleotide frequencies) that are signature representations of each genome. Because the classification power is very high, the SOMs can provide fundamental bioinformatic strategies for extracting a wide range of genomic information that could not otherwise be obtained.
机译:随着可用基因组序列的越来越多,需要新的工具来综合分析各种基因组的物种特异性序列特征。我们使用了无监督的神经网络算法,Kohonen的自组织地图(SOM),分析了9种已知序列的9真核基因组(总为1.2GB)中的二核和三核苷酸频率; S.Cerevisiae,S.Pombe,C. Elegans,A. Thilana,D.Melanogaster,Fugu和Rice,以及P. falciparum染色体2和3,以及人染色体14,20,21和22,具有几乎完全排序。具有不同窗尺寸的每个基因组序列被编码为赋予二核和三核苷酸的相对频率的16和64尺寸载体。根据分析总共120,000克的10-kB序列和具有10kb的移动台尺寸的重叠100-kb序列,衍生自1.2GB基因组序列的总共,可以获得大多数序列的澄清物种特异性分离SOM。未经监督的算法可以在120,000个10-kB序列中识别,其特异性特征(寡核苷酸频率的关键组合),其是每个基因组的特征表示。由于分类功率非常高,因此SOM可以提供基本的生物信息策略,用于提取无法获得的各种基因组信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号