Ordered Minimal Perfect Hash of the Human Genome and Implications for Duplicate Finding

机译：人类基因组的有序最小完美哈希及其对重复发现的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hashing long strings is difficult, especially when the alphabet is small. Chess and GO game board hashing has almost always been accomplished by using (letter position) pairs to index into a table of random numbers which are exclusive-orpsilad to create the hash value. The table of random numbers can be a huge source of different hash functions by varying any bit of any random number. Algorithms are developed here that can find hashes that are perfect, minimal, and even ordered for very large cases. The human genome is a great source of small alphabet strings that are long, so it is used as a test case here. An algorithm is presented that can solve for an ordered minimal perfect hash for the genome. It can also solve for the lesser cases of minimal perfect and perfect hash at higher speed. A statistical criterion is derived for obtaining the ordered minimal perfect hash with high probability. The algorithm and the statistical criterion lead to a duplicate finding algorithm that might prove to be fastest for important cases.

机译：散列长字符串非常困难，尤其是在字母较小的情况下。 Chess and GO游戏板的哈希处理几乎总是通过使用（字母位置）对索引到随机数表中来完成的，这些表经异或运算以创建哈希值。通过改变任意随机数的任何位，随机数表可以成为大量不同哈希函数的来源。这里开发的算法可以找到完美，最小甚至在非常大的情况下有序的哈希。人类基因组是长而小的字母字符串的重要来源，因此在这里用作测试用例。提出了一种可以解决基因组有序最小完美哈希的算法。它还可以解决较小的情况，即以较高的速度实现最小完美和完美哈希。导出统计标准以高概率获得有序最小完美散列。该算法和统计标准导致重复发现算法，对于重要情况，该算法可能被证明是最快的。

著录项

来源
《WRI World Congress on Computer Science and Information Engineering》|2009年|106-111|共6页
会议地点
作者
Zobrist A.L.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
file organisation; random number generation; statistical analysis; GO game board; chess game board; duplicate finding algorithm; hash functions; human genome; ordered minimal perfect hash; probability; random numbers; small alphabet strings; statistical criterion;

机译：文件组织;随机数生成;统计分析; GO游戏板;象棋游戏板;重复查找算法;哈希函数;人类基因组;有序最小完美哈希;概率;随机数;小字母字符串;统计准则;

相似文献

外文文献
中文文献
专利

1. Minimal Perfect Hashing-Based Information Collection Protocol for RFID Systems [J] . Xin Xie, Xiulong Liu, Keqiu Li, IEEE transactions on mobile computing . 2017,第10期

机译：RFID系统中基于最小散列的最小信息收集协议
2. Fast and Scalable Minimal Perfect Hashing for Massive Key Sets [J] . Antoine Limasset, Guillaume Rizk, Rayan Chikhi, LIPIcs : Leibniz International Proceedings in Informatics . 2017,第30期

机译：用于大规模钥匙套的快速且可扩展的最小完美散列
3. Minimal perfect hashing: A competitive method for indexing internal memory [J] . Botelho F.C., Lacerda A., Menezes G.V., Information Sciences: An International Journal . 2011,第13期

机译：最小的完美哈希：一种索引内部存储器的竞争方法
4. Ordered Minimal Perfect Hash of the Human Genome and Implications for Duplicate Finding [C] . Albert Lindsey Zobrist WRI World Congress on Computer Science and Information Engineering . 2009

机译：人体基因组的最小完美哈希和重复发现的影响
5. Parallel generation of ordered minimal perfect hash functions. [D] . Siska, Charles Paul, Jr. 1998

机译：有序最小完美散列函数的并行生成。
6. The social genome: Current findings and implications for the study of human genetics [O] . Benjamin W. Domingue, Daniel W. Belsky 2017

机译：社会基因组：人类遗传学研究的当前发现和启示
7. Finding minimal perfect hash functions [O] . Gary Haggard, Kevin Karplus 1986

机译：寻找最小的完美哈希函数

Ordered Minimal Perfect Hash of the Human Genome and Implications for Duplicate Finding

摘要

著录项

相似文献

相关主题

期刊订阅