...
首页> 外文期刊>plos computational biology >Tight basis cycle representatives for persistent homology of large biological data sets
【24h】

Tight basis cycle representatives for persistent homology of large biological data sets

机译:Tight basis cycle representatives for persistent homology of large biological data sets

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Persistent homology (PH) is a popular tool for topological data analysis that has found applications across diverse areas of research. It provides a rigorous method to compute robust topological features in discrete experimental observations that often contain various sources of uncertainties. Although powerful in theory, PH suffers from high computation cost that precludes its application to large data sets. Additionally, most analyses using PH are limited to computing the existence of nontrivial features. Precise localization of these features is not generally attempted because, by definition, localized representations are not unique and because of even higher computation cost. Such a precise location is a sine qua non for determining functional significance, especially in biological applications. Here, we provide a strategy and algorithms to compute tight representative boundaries around nontrivial robust features in large data sets. To showcase the efficiency of our algorithms and the precision of computed boundaries, we analyze the human genome and protein crystal structures. In the human genome, we found a surprising effect of the impairment of chromatin loop formation on loops through chromosome 13 and the sex chromosomes. We also found loops with long-range interactions between functionally related genes. In protein homologs with significantly different topology, we found voids attributable to ligand-interaction, mutation, and differences between species. Author summaryThe relative arrangement of constituents in a biological system is often functionally significant. Persistent homology computes the existence of regions devoid of constituents that are surrounded by regions of high density, which we can think of as holes, that are robust to experimental uncertainties. An important question then is what purpose do these robust topological features serve in the underlying system? To investigate this, it is important to compute their precise locations. However, this computation suffers from high cost and non-uniqueness of representative boundaries of these holes. In this work, we developed a set of algorithms and a strategy that computes representative boundaries around holes with high precision in large data sets. We were able to process the human genome at a high resolution in a few minutes, a computation that extant algorithms could not attempt. We also determined locations of significant topological differences in crystal structures of protein homologous sequences. This work enables research into the functional significance of robust features in large biological data sets.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号