首页> 外文会议>8th World Multi-Conference on Systemics, Cybernetics and Informatics(SCI 2004) vol.7: Applications of Informatics and Cybernetics in Science and Engineering >A novel bioinformatics strategy for unveiling hidden characteristics in genome sequences and searching in silico for genetic signal sequences
【24h】

A novel bioinformatics strategy for unveiling hidden characteristics in genome sequences and searching in silico for genetic signal sequences

机译:一种新颖的生物信息学策略,用于揭示基因组序列中的隐藏特征并在计算机上搜索遗传信号序列

获取原文
获取原文并翻译 | 示例

摘要

Novel bioinformatic tools are needed for comprehensive analyses of massive amounts of available genome DNA sequences. An unsupervised neural network algorithm, self-organizing map (SOM), is an effective tool for clustering and visualizing high-dimensional complex data on a single map. We generated SOMs for tri-, terra-, and pentanucleotide frequencies in 300,000 10-kb sequences from 13 eukaryotes for which almost complete genomic sequences are available (a total of 3 Gb). SOM recognized in most 10-kb sequences species-specific characteristics (key combinations of oligonucleotide frequencies), permittingrnspecies-specific classification of sequences without any information regarding the species. Because the classification power is very high, SOM is an efficient and powerful tool for extracting a wide range of genomic information. SOM constructed with oligonucleotide frequencies in 10-kb sequences from 2.8 Gb of human sequences identified oligonucleotides occurring with frequencies characteristically biased from random occurrence predicted from the mononucleotide composition; 10-kb sequences rich in these oligonucleotides were self-organized on a map. Because these oligonucleotides often corresponded to genetic signals or their constituent elements, we propose an in silico method that should be useful for identification of genetic signal sequences in genomes for which large amounts of sequence data are available but additional experimental data are lacking.
机译:需要新颖的生物信息学工具对大量可用基因组DNA序列进行全面分析。自组织地图(SOM)是一种无监督的神经网络算法,是在单个地图上聚类和可视化高维复杂数据的有效工具。我们从13个真核生物中获得了300,000个10 kb序列中的三核苷酸,四核苷酸和五核苷酸频率的SOM,这些基因几乎可获得完整的基因组序列(总共3 Gb)。 SOM在大多数10-kb序列中识别出物种特有的特征(寡核苷酸频率的关键组合),允许对物种进行特有的分类,而无需任何有关物种的信息。由于分类能力非常高,因此SOM是提取广泛的基因组信息的有效而强大的工具。用从2.8 Gb的人类序列中的10 kb序列中的寡核苷酸频率构建的SOM,鉴定出寡核苷酸的出现频率特征上偏离了单核苷酸组成预测的随机发生;富含这些寡核苷酸的10-kb序列在图谱上自组织。由于这些寡核苷酸通常对应于遗传信号或其组成元素,因此我们提出了一种计算机方法,该方法应可用于鉴定基因组中的遗传信号序列,因为该基因组中可获得大量序列数据,但缺少其他实验数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号