【24h】

GENOMIN: A SOFTWARE FRAMEWORK FOR READING GENOMIC SIGNALS

机译:Gennomin:用于阅读基因信号的软件框架

获取原文
获取原文并翻译 | 示例
           

摘要

Data mining produces models that capture and represent hidden patterns in the DNA structure. Any attempt to develop and test new algorithms for data mining in the field of bioinformatics, must begin with an optimal method by which even the huge FASTA files can be read step by step. The aim of the GENOMIN software is to provide an open source software platform which can work with large files like a whole chromosome or genome sequence. We have created an open source template software, named GENOMIN, for analyzing genetic data of sequences of different sizes downloaded from NCBI servers. Large NCBI FASTA files which store sequences of individual chromosomes come from other processing systems like UNIX. Processing these files on other operating systems is difficult due to different markers which indicate the end of each line. The GENOMIN software, reads the FASTA files by continuous buffer reading, without taking into account the end of line markers. The result of this type of reading is a brute, noisy free DNA sequence of the entire file regardless of its size. We presented three examples to demonstrate how the program can be used in biology: the estimation of GC content, identification of repetitive elements and search for sequences with different biological functions (e.g. duplicated regions or potential binding sites for transcription factors). Development of this open source software is limited only by the researcher programming skills. The results of our tests have been shown that GENOMIN can perform various tests on large sequences files and can work with different algorithms used in biology.
机译:数据挖掘产生的模型可以捕获并表示DNA结构中的隐藏模式。在生物信息学领域中开发和测试用于数据挖掘的新算法的任何尝试都必须以一种最佳方法开始,通过该方法,即使是巨大的FASTA文件也可以逐步读取。 GENOMIN软件的目的是提供一个开放源代码软件平台,该平台可以处理诸如整个染色体或基因组序列之类的大文件。我们创建了一个名为GENOMIN的开源模板软件,用于分析从NCBI服务器下载的不同大小序列的遗传数据。存储单个染色体序列的大型NCBI FASTA文件来自UNIX等其他处理系统。由于不同的标记指示每行的结尾,因此很难在其他操作系统上处理这些文件。 GENOMIN软件通过连续读取缓冲区来读取FASTA文件,而无需考虑行尾标记。这种读取的结果是整个文件的粗暴,嘈杂的自由DNA序列,而不管其大小如何。我们提供了三个示例来说明该程序如何在生物学中使用:GC含量的估算,重复元素的识别以及搜索具有不同生物学功能(例如重复区域或转录因子的潜在结合位点)的序列。此开源软件的开发仅受研究人员编程技能的限制。我们的测试结果表明,GENOMIN可以对大型序列文件执行各种测试,并且可以与生物学中使用的不同算法一起使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号