首页> 美国卫生研究院文献>other >Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
【2h】

Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

机译:符号序列分析的基于熵的散度测度的推广

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
机译:基于熵的度量已经常用于符号序列分析。对称和平滑形式的Kullback-Leibler散度或相对熵Jensen-Shannon散度(JSD)特别令人感兴趣,因为它与其他散度量度的族具有共同的特性,并且在统计物理,信息论等不同领域具有可解释性和数理统计。由于许多属性,包括对任何数量的概率分布的概括以及权重与分布的关联,因此出现了此度量的独特性和多功能性。此外,它的熵表述允许将其推广到不同的统计框架中,例如非广泛的Tsallis统计和高阶Markovian统计。我们将重新审视这些概括,并在集成的Tsallis和Markovian统计框架中提出JSD的新概括。我们证明了这种概括可以用互信息来解释。我们还研究了不同的JSD概括在解构从细菌基因组组装的嵌合DNA序列中的性能,这些基因组包括大肠杆菌,肠伤寒沙门氏菌,鼠疫耶尔森氏菌和流感嗜血杆菌。我们的结果表明,当所比较的序列来自系统发生近端生物时,JSD泛化带来了更为明显的改进,这些序列由于其组成相似性而通常难以区分。尽管使用Tsallis统计JSD概括观察到了较小但明显的改进,但是使用马尔可夫概括观察到了相对较大的改进。相比之下,相对于Tsallis和Markovian概括,拟议的Tsallis-Markovian概括产生了更明显的改进,特别是当比较的序列是从系统发育上邻近的生物产生的时。

著录项

  • 期刊名称 other
  • 作者

    Miguel A. Ré; Rajeev K. Azad;

  • 作者单位
  • 年(卷),期 -1(9),4
  • 年度 -1
  • 页码 e93532
  • 总页数 11
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号