HFM: Hierarchical Feature Moment Extraction for Multi-Omic Data Visualization

机译：HFM：多目标数据可视化的分层特征矩提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sequencing the DNA of the estimated 7.5 billion living humans would generate 1.4 zettabytes of data. However, given current per-read rendering techniques, just one DNA alignment file which is around 200 gigabytes can be resource intensive to visualize at arbitrary scale. Going from human DNA and RNA sequencing data to biological insight is a process that requires domain knowledge in addition to computational methods that are bound by time and space. We address these limitations by integrating a parallel out-of-core feature extraction algorithm with a disk-based hierarchical data store that provides several orders of magnitude speed-up for common analysis and visualization tasks. To demonstrate the effectiveness of our strategy, we have developed a web-based REST service that serves translated data to a real-time genomic viewer, which in turn renders standardized moments as stacked-area graphs of features in milliseconds for multiple samples using a familiar genome browser interface. Unlike per-read techniques which read a variable number of rows from the sequence alignment file depending on the region of interest, our data structure returns a controllable data size of that region, making the technique ideally suited for visualization and macro-level insight of large cohorts. The strategy works well for high-coverage single coordinate-based visualization but could be extended for use in other long-range visualization techniques. We detail our open-source Cython/Python based implementation as well as our prototype web-based visualization tool and then compare the resulting performance and against established visualization tools.

机译：对估计有75亿活着的人类的DNA进行测序将产生1.4 ZB的数据。但是，如果使用当前的每次读取渲染技术，则只有一个大约200 GB的DNA对齐文件可能会占用大量资源，无法以任意比例进行可视化。从人类DNA和RNA测序数据到生物学洞察力是一个过程，除了受时间和空间限制的计算方法外，还需要领域知识。我们通过将并行核心外特征提取算法与基于磁盘的分层数据存储相集成来解决这些限制，该磁盘为常见的分析和可视化任务提供了几个数量级的加速。为了证明我们的策略的有效性，我们开发了一种基于Web的REST服务，该服务将转换后的数据提供给实时基因组查看器，从而使用熟悉的样本将标准化的矩以毫秒为单位的多个样本的特征叠加区域图呈现基因组浏览器界面。不同于每次读取技术都会根据感兴趣的区域从序列比对文件中读取可变数量的行的方法，我们的数据结构返回的是该区域的可控制数据大小，因此该技术非常适合可视化和大范围的宏观分析队列。该策略对于高覆盖率的基于单坐标的可视化效果很好，但可以扩展以用于其他远程可视化技术。我们详细介绍了基于开源Cython / Python的实现以及基于Web的原型可视化工具，然后将结果性能与已建立的可视化工具进行了比较。

著录项

来源
《IEEE International Conference on Bioinformatics and Biomedicine》|2019年|1970-1976|共7页
会议地点
作者
Timothy James Becker; Dong-Guk Shin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
bioinformatics; data structures; data visualisation; DNA; feature extraction; genomics; rendering (computer graphics); RNA; Web services;

机译：生物信息学;数据结构;数据可视化; DNA;特征提取;基因组学;渲染（计算机图形学）; RNA; Web服务;
入库时间 2022-08-26 14:34:39

相似文献

外文文献
中文文献
专利

1. Efficient methods for hierarchical multi-omic feature extraction and visualisation [J] . Becker Timothy, Shin Dong-Guk International journal of data mining and bioinformatics . 2020,第4期

机译：分层多OMIC特征提取和可视化的有效方法
2. Super.FELT: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data [J] . Park Sejin, Soh Jihee, Lee Hyunju BMC Bioinformatics . 2021,第1期

机译：Super.felt：使用多OMICS数据使用三态损耗来监督特征提取学习
3. Privacy-Preserving Krawtchouk Moment feature extraction over encrypted image data [J] . Yang Tengfei, Ma Jianfeng, Miao Yinbin, Information Sciences: An International Journal . 2020,第1期

机译：隐私保留krawtchouk时刻特征提取通过加密图像数据
4. HFM: Hierarchical Feature Moment Extraction for Multi-Omic Data Visualization [C] . Timothy James Becker, Dong-Guk Shin IEEE International Conference on Bioinformatics and Biomedicine . 2019

机译：HFM：用于多OMIC数据可视化的分层特征时刻提取
5. Information visualization design for multidimensional data: Integrating the rank-by-feature framework with hierarchical clustering. [D] . Seo, Jinwook. 2005

机译：多维数据的信息可视化设计：将按功能排列的框架与层次化群集集成在一起。
6. Super.FELT: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data [O] . Sejin Park, Jihee Soh, Hyunju Lee 2021

机译：Super.Felt：使用多常规数据使用三联损耗进行调节特征提取学习
7. INFORMATION VISUALIZATION DESIGN FOR MULTIDIMENSIONAL DATA: INTEGRATING THE RANK-BY-FEATURE FRAMEWORK WITH HIERARCHICAL CLUSTERING [O] . Seo Jinwook 2005

机译：多维数据的信息可视化设计：通过分层聚类将逐个特征的框架集成在一起
8. Feature Extraction and Recognition of Two-Dimensional Data by the Method of Moments. [R] . gonzalez, r. c. harris, j. m. 1977

机译：基于矩量法的二维数据特征提取与识别。

HFM: Hierarchical Feature Moment Extraction for Multi-Omic Data Visualization

摘要

著录项

相似文献

相关主题

期刊订阅