首页> 美国卫生研究院文献>Taylor Francis Open Select >Towards computational improvement of DNA database indexing and short DNA query searching
【2h】

Towards computational improvement of DNA database indexing and short DNA query searching

机译:寻求DNA数据库索引和短DNA查询搜索的计算改进

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions are not reported, if the database is searched against a query shorter than nucleotides, such that is the length of the DNA database words being mapped and is the length of the query. A solution of this drawback is also presented.
机译:为了促进和加快海量DNA数据库的搜索速度,该数据库在开始时就采用了映射功能来对索引进行索引。通过搜索索引的数据结构,可以确定确切的查询命中。如果针对带注释的DNA查询(例如已知的启动子共有序列)搜索数据库,则可以确定起始位置和潜在基因的数量。如果必须对未注释的DNA序列进行功能注释,则这尤其重要。但是,索引庞大的DNA数据库并搜索具有数百万个条目的索引数据结构是一个耗时的过程。在本文中,我们提出了一种快速的DNA数据库索引和搜索方法,该方法可以识别数据库中的所有查询命中,而不必检查索引数据结构中的所有条目,从而限制了可以针对数据库搜索的查询的最大长度。通过应用建议的索引方程式,假设有足够的RAM存储索引数据结构,则可以在个人计算机上在10小时内对整个人类基因组进行索引。通过分析Reneker提出的方法,我们观察到,如果数据库是针对比核苷酸短的查询进行搜索的,则不会报告起始位置的匹配,这是被映射的DNA数据库字的长度,也是查询的长度。还提出了该缺点的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号