首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >HFM: Hierarchical Feature Moment Extraction for Multi-Omic Data Visualization
【24h】

HFM: Hierarchical Feature Moment Extraction for Multi-Omic Data Visualization

机译:HFM:多目标数据可视化的分层特征矩提取

获取原文

摘要

Sequencing the DNA of the estimated 7.5 billion living humans would generate 1.4 zettabytes of data. However, given current per-read rendering techniques, just one DNA alignment file which is around 200 gigabytes can be resource intensive to visualize at arbitrary scale. Going from human DNA and RNA sequencing data to biological insight is a process that requires domain knowledge in addition to computational methods that are bound by time and space. We address these limitations by integrating a parallel out-of-core feature extraction algorithm with a disk-based hierarchical data store that provides several orders of magnitude speed-up for common analysis and visualization tasks. To demonstrate the effectiveness of our strategy, we have developed a web-based REST service that serves translated data to a real-time genomic viewer, which in turn renders standardized moments as stacked-area graphs of features in milliseconds for multiple samples using a familiar genome browser interface. Unlike per-read techniques which read a variable number of rows from the sequence alignment file depending on the region of interest, our data structure returns a controllable data size of that region, making the technique ideally suited for visualization and macro-level insight of large cohorts. The strategy works well for high-coverage single coordinate-based visualization but could be extended for use in other long-range visualization techniques. We detail our open-source Cython/Python based implementation as well as our prototype web-based visualization tool and then compare the resulting performance and against established visualization tools.
机译:对估计有75亿活着的人类的DNA进行测序将产生1.4 ZB的数据。但是,如果使用当前的每次读取渲染技术,则只有一个大约200 GB的DNA对齐文件可能会占用大量资源,无法以任意比例进行可视化。从人类DNA和RNA测序数据到生物学洞察力是一个过程,除了受时间和空间限制的计算方法外,还需要领域知识。我们通过将并行核心外特征提取算法与基于磁盘的分层数据存储相集成来解决这些限制,该磁盘为常见的分析和可视化任务提供了几个数量级的加速。为了证明我们的策略的有效性,我们开发了一种基于Web的REST服务,该服务将转换后的数据提供给实时基因组查看器,从而使用熟悉的样本将标准化的矩以毫秒为单位的多个样本的特征叠加区域图呈现基因组浏览器界面。不同于每次读取技术都会根据感兴趣的区域从序列比对文件中读取可变数量的行的方法,我们的数据结构返回的是该区域的可控制数据大小,因此该技术非常适合可视化和大范围的宏观分析队列。该策略对于高覆盖率的基于单坐标的可视化效果很好,但可以扩展以用于其他远程可视化技术。我们详细介绍了基于开源Cython / Python的实现以及基于Web的原型可视化工具,然后将结果性能与已建立的可视化工具进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号