首页> 外文会议>WRI World Congress on Computer Science and Information Engineering >Ordered Minimal Perfect Hash of the Human Genome and Implications for Duplicate Finding
【24h】

Ordered Minimal Perfect Hash of the Human Genome and Implications for Duplicate Finding

机译:人类基因组的有序最小完美哈希及其对重复发现的影响

获取原文

摘要

Hashing long strings is difficult, especially when the alphabet is small. Chess and GO game board hashing has almost always been accomplished by using (letter position) pairs to index into a table of random numbers which are exclusive-orpsilad to create the hash value. The table of random numbers can be a huge source of different hash functions by varying any bit of any random number. Algorithms are developed here that can find hashes that are perfect, minimal, and even ordered for very large cases. The human genome is a great source of small alphabet strings that are long, so it is used as a test case here. An algorithm is presented that can solve for an ordered minimal perfect hash for the genome. It can also solve for the lesser cases of minimal perfect and perfect hash at higher speed. A statistical criterion is derived for obtaining the ordered minimal perfect hash with high probability. The algorithm and the statistical criterion lead to a duplicate finding algorithm that might prove to be fastest for important cases.
机译:散列长字符串非常困难,尤其是在字母较小的情况下。 Chess and GO游戏板的哈希处理几乎总是通过使用(字母位置)对索引到随机数表中来完成的,这些表经异或运算以创建哈希值。通过改变任意随机数的任何位,随机数表可以成为大量不同哈希函数的来源。这里开发的算法可以找到完美,最小甚至在非常大的情况下有序的哈希。人类基因组是长而小的字母字符串的重要来源,因此在这里用作测试用例。提出了一种可以解决基因组有序最小完美哈希的算法。它还可以解决较小的情况,即以较高的速度实现最小完美和完美哈希。导出统计标准以高概率获得有序最小完美散列。该算法和统计标准导致重复发现算法,对于重要情况,该算法可能被证明是最快的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号