首页> 外文期刊>Information Processing & Management >Indexing genomic sequence libraries
【24h】

Indexing genomic sequence libraries

机译:索引基因组序列文库

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes an extensible, open-source (GPL) data repository and retrieval system that supports fast, efficient, keyword based retrieval of genomic sequences from 1 multiple libraries with retrieved sequences post-processed by FASTA, Smith-Waterman and other analysis software. This application is implemented for Linux and is written in Mumps, C, and C++ with supporting components that include the Berkeley Data Base, the Perl Compatible Regular Expression Library, GLADE, and tools such as FASTA, Smith-Waterman, and modules from EMBOSS. The package described here can quickly index data sets of up to 256 terabytes using a B-tree based multi-dimensional data model. An example is presented that indexes the text of the full NCBI Genbank library. (C) 2003 Elsevier Ltd. All rights reserved.
机译:本文介绍了一种可扩展的开源(GPL)数据存储和检索系统,该系统支持从1个多个库中快速,高效,基于关键词的基因组序列检索,并使用FASTA,Smith-Waterman和其他分析软件对检索到的序列进行后处理。该应用程序是为Linux实现的,用Mumps,C和C ++编写,具有包括伯克利数据库,Perl兼容正则表达式库,GLADE在内的支持组件,以及FASTA,Smith-Waterman之类的工具以及EMBOSS的模块。使用基于B树的多维数据模型,此处描述的包可以快速索引多达256 TB的数据集。给出了一个示例,该示例为整个NCBI Genbank库的文本建立索引。 (C)2003 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号