首页> 外文会议>World Multi-Conference on Systemics, Cybernetics and Informatics Applications of Informatics and Cybernetics in Science and Engineering >A novel bioinformatics strategy for unveiling hidden characteristics in genome sequences and searching in silica for genetic signal sequences
【24h】

A novel bioinformatics strategy for unveiling hidden characteristics in genome sequences and searching in silica for genetic signal sequences

机译:一种新的生物信息学术语,用于在基因组序列中揭示隐性特征及在二氧化硅中搜索遗传信号序列

获取原文

摘要

Novel bioinforrnatic tools are needed for comprehensive analyses of massive amounts of available genome DNA sequences. An unsupervised neural network algorithm, self-organizing map (SOM), is an effective tool for clustering and visualizing high-dimensional complex data on a single map. We generated SOMs for tri-, terra-, and pentanucleotide frequencies in 300,000 10-kb sequences from 13 eukaryotes for which almost complete genomic sequences are available (a total of 3 Gb). SOM recognized in most 10-kb sequences species-specific characteristics (key combinations of oligonucleotide frequencies), permitting species-specific classification of sequences without any information regarding the species. Because the classification power is very high, SOM is an efficient and powerful tool for extracting a wide range of genomic information. SOM constructed with oligonucleotide frequencies in 10-kb sequences from 2.8 Gb of human sequences identified oligonucleotides occurring with frequencies characteristically biased from random occurrence predicted from the mononucleotide composition; 10-kb sequences rich in these oligonucleotides were self-organized on a map. Because these oligonucleotides often corresponded to genetic signals or their constituent elements, we propose an in silico method that should be useful for identification of genetic signal sequences in genomes for which large amounts of sequence data are available but additional experimental data are lacking.
机译:需要进行新的生物虫功能,以进行大量可用基因组DNA序列的综合分析。无监督的神经网络算法,自组织地图(SOM)是用于在单个地图上聚类和可视化高维复杂数据的有效工具。我们为来自13个真核生物的300,000个10kb序列中的三核核苷酸频率产生的SOMS产生了几乎完全的基因组序列(共3 GB)。 SOM以大多数10 kB序列物种特异性特征(寡核苷酸频率的关键组合),允许物种特异性序列分类,而没有关于物种的任何信息。由于分类功率非常高,因此SOM是一种有效而强大的工具,用于提取各种基因组信息。 SOM由来自2.8GB的人序列的10-KB序列中的寡核苷酸频率构成,所述人序列的鉴定与特征在单核苷酸组合物预测的随机发生的频率发生的寡核苷酸;富含这些寡核苷酸的10kb序列在地图上自组织。因为这些寡核苷酸通常与遗传信号或其组成元素相对应,所以我们提出了硅方法,该方法应该有助于鉴定基因组中的遗传信号序列,其中有大量的序列数据,但缺乏额外的实验数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号