首页> 外文会议>IEEE International Symposium on Information Theory >Capacity and Expressiveness of Genomic Tandem Duplication
【24h】

Capacity and Expressiveness of Genomic Tandem Duplication

机译:基因组串联复制的能力和表现力

获取原文
获取外文期刊封面目录资料

摘要

The majority of the human genome consists of repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCTGTGC, TGTG is a tandem repeat, namely, generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called the seed) by tandem duplications of bounded length. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2,3, and 4. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system, as the feasibility of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. Noticing that a system with capacity = 1 is expressive, we prove that for an alphabet size ≥ 4, the capacity is strictly smaller than 1, independent of the seed and the duplication lengths. The proof of this limit on the capacity (note that the genomic alphabet size is 4), is related to an interesting result by Axel Thue from 1906 which states that there exist arbitrary length sequences with no tandem repeats (square-free) for alphabet size ≥ 3. Finally, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.
机译:大多数人类基因组由重复序列组成。人类基因组中常见的重要类型的重复是串联重复,其中相同的副本彼此相邻。例如,在序列AGTCTGTGC中,TGTG是由AGTCTGC通过串联重复生成的长度2的串联重复。在这项工作中,我们研究了从小型初始字符串产生大量序列的可能性(称为种子)通过串联长度的重复。我们的结果包括某些串联重复串系统的确切容量值,具有字母表尺寸2,3和4.此外,通过DNA序列在通过RNA和遗传密码表达蛋白质中的作用,我们定义了表现力的概念串联复制系统,作为表达任意子串的可行性。然后,我们完全表征了串联复制系统的表现力,用于一般字母表尺寸和复制长度。注意到具有容量= 1的系统是表现力的,我们证明对于字母表大小≥4,容量严格小于1,独立于种子和复制长度。该限制的证明能力(注意,基因组字母表尺寸为4),与1906年的AXEL THUE有关的有关,其指出,存在无串联重复的任意长度序列(不平方),用于字母大小≥3。最后,我们的结果说明了复制长度比生成这些系统的大量序列的种子更具显着作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号