首页> 美国政府科技报告 >Discovering related DNA sequences via mutual information
【24h】

Discovering related DNA sequences via mutual information

机译:通过互信息发现相关的DNa序列

获取原文

摘要

One of the problems in DNA and protein sequence comparisons is to decide whether the observed similarity should be explained by their relatedness or by the mere presence of some shared internal structure, e.g., shared internal repetitive patterns. Machine discovery of related DNA sequences critically depends on a solution to this problem. In this paper we propose a general solution that is based on minimal length encoding: we measure mutual information between two sequences as the difference between the encoding length of one of the sequences sequence and its encoding length relative to the other sequence; two sequences are considered similar if the mutual information exceeds a threshold of significance; the significance is determined using an extension of the newly proposed algorithmic significance method. We show that mutual information factors out sequence similarity that is due to shared internal structure and thus enables discovery of truly related sequences. In addition to this general method, we also propose an efficient way to compare sequences based on their subword composition that does not require any a priori assumptions about k-tuple length.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号