首页> 美国卫生研究院文献>Molecular Therapy. Nucleic Acids >AptaSUITE: A Full-Featured Bioinformatics Framework for the Comprehensive Analysis of Aptamers from HT-SELEX Experiments
【2h】

AptaSUITE: A Full-Featured Bioinformatics Framework for the Comprehensive Analysis of Aptamers from HT-SELEX Experiments

机译:AptaSUITE:全面的HT-SELEX实验适体分析的生物信息学框架

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

class="head no_bottom_margin" id="sec1title">Main Text>To the editor:The capability of producing and efficiently processing big data has revolutionized virtually every field of science and technology and has enabled the analysis of experimental results at unprecedented resolutions. This trend is also evidenced in the rapid emergence, and subsequent field-wide adoption, of the high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX) protocol in the study of in vitro selection. HT-SELEX extends the traditional SELEX protocol, aimed at the generation of high-affinity and specificity oligonucleotides known as aptamers against a molecular target of interest, by coupling this technology with HT sequencing. Selection is typically performed in iterations and consists of incubating an initially random pool of sequences with the target, followed by the partitioning, and subsequent removal, of non-affine species while amplifying the remaining pool to form the input to the next cycle as well as the source material for sequencing. The resulting sequencing data, consisting of a representative sample of the pool composition after each round of selection, are consequently analyzed in silico through dedicated algorithmic approaches, whereby aptamers predicted to possess the desired application-specific properties are typically subjected to further in vitro verification and post-processing.Notably, in order to guarantee an efficient and accurate in silico pipeline, these computational methods must be carefully designed to maximize efficiency while scaling well vertically (guaranteeing a proportional reduction in computation time with a growing number of available processing units and memory) as well as with increasing data volume. Ideally, such tools would additionally require a low learning curve for experimentalists, be platform-independent, and provide integrated means of storing and retrieving, interacting with, and visualizing aptamer-related information.Indeed, over the past decade, typical HT-SELEX datasets have grown 200-fold from 10,000–100,000 reads per selection cycle to current sizes of routinely over 20 million reads per round. This development constitutes an emerging barrier for many well-established algorithmic tools devised before the big data revolution but which are still actively used in RNA bioinformatics analysis pipelines. Prominent examples include clustering sequence species into aptamer families related to each other by sequence similarity, , and the elucidation of shared motifs in primary and/or secondary structure evolving throughout the selection. Both analysis tasks are well established for small-scale datasets but rapidly become computationally intractable with increasing data volume when performed with traditional methods., More complex approaches, designed for scalable, HT data processing on multi-core environments, as found in data centers and cloud environments, typically require expert knowledge to set up a sensible pipeline and may depend on numerous, potentially non-portable, third-party software packages, increasing the burden of long term maintainability., , In addition, the resulting processed data are predominantly output in pure text format or stored in relational databases, adding to the stack of challenges in efficiently interpreting the results. Finally, while undoubtedly being of great utility, web-based (and therefore graphical) solutions backed by cloud services such as the Galaxy project, are limited in their flexibility of visualizing and interacting with vast amounts of data as they must adhere to the constraints imposed by current web browsers and technologies.To address these issues, we have developed AptaSuite, a full-featured, open source, and platform-independent software collection for the comprehensive analysis of HT-SELEX experiments. In stark contrast to previous methods, each implementing their individual and frequently rudimentary data workflow, AptaSuite provides a unified and robust framework for managing aptamer-related data and leverages this framework to serve the required data in a standardized manner to any particular algorithm built with the software. In its core, AptaSuite consists of a collection of carefully designed APIs (application programming interfaces) and corresponding reference implementations for facilitating input, output, and manipulation of aptamer data (such as sequences, aptamer counts in individual selection cycles, structure information, and more). On top of this powerful core library, a number of previously published approaches, , , , href="#bib15" rid="bib15" class=" bibr popnode">15 have been implemented from scratch and are now combined into this uniform, easy-to-use framework (see href="/pmc/articles/PMC5992478/figure/fig1/" target="figure" class="fig-table-link figpopup" rid-figpopup="fig1" rid-ob="ob-fig1" co-legend-rid="lgnd_fig1">Figure 1). In particular, the selected methods constitute well-established approaches to analyze HT-SELEX data and are specifically designed to leverage particular properties of aptamers and the SELEX process.href="/pmc/articles/PMC5992478/figure/fig1/" target="figure" rid-figpopup="fig1" rid-ob="ob-fig1">class="inline_block ts_canvas" href="/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=5992478_gr1.jpg" target="tileshopwindow">target="object" href="/pmc/articles/PMC5992478/figure/fig1/?report=objectonly">Open in a separate windowclass="figpopup" href="/pmc/articles/PMC5992478/figure/fig1/" target="figure" rid-figpopup="fig1" rid-ob="ob-fig1">Figure 1The Modularized Architecture of AptaSuiteDiagram depicting the programmatic architecture of AptaSuite. Core libraries for the storage, retrieval, and manipulation of aptamers are accessed through a well-defined API which, in turn, serves data to and accepts data from the algorithms responsible for input, processing, and output of aptamers. Core libraries include efficient solutions for storing primary and secondary structure information regarding the accepted aptamers, a digital representation of the performed selection by storing the experimental setup, as well as information about the performed selection cycles and auxiliary tools, such as secondary structure prediction algorithms, which have been ported to Java to maintain platform independence. The software layer currently features AptaPLEX, a multithreaded demultiplexer for HT-SELEX data; AptaSIM, aimed at realistically simulating the selection dynamics of SELEX experiments; AptaCLUSTER for the efficient determination of aptamer families; AptaMUT, tailored to the identification of mutants with improved binding affinity; and AptaTRACE, an efficient algorithm for sequence-structure motif elucidation utilizing the entirety of the available aptamer pools. Finally, each computational method is accessible either from command line or through the graphical user interface.
机译:<!-fig ft0-> <!-fig @ position =“ anchor” mode =文章f4-> <!-fig mode =“ anchred” f5-> <!-fig / graphic | fig / alternatives / graphic mode =“ anchored” m1-> class =“ head no_bottom_margin” id =“ sec1title”>主文本 >致编辑者:有效地制作和制作的功能处理大数据几乎改变了科学和技术的各个领域,并以前所未有的分辨率实现了对实验结果的分析。在体外选择研究中,通过指数富集(HT-SELEX)方案的高通量配体高通量系统进化迅速出现,随后在整个领域广泛采用,也证明了这一趋势。 HT-SELEX扩展了传统的SELEX协议,旨在通过将这项技术与HT测序相结合来生成高亲和力和特异性的寡核苷酸,称为针对目标分子靶标的适体。选择通常以迭代方式进行,包括将最初的随机序列库与靶标一起孵育,然后对非亲和性物种进行分区和随后的去除,同时放大剩余的库以形成下一个循环的输入以及测序的原始资料。因此,通过每一轮选择后由池组成的代表性样品组成的所得测序数据,将通过专用算法在计算机上进行分析,从而对预测具有所需特定应用特性的适体进行进一步的体外验证和分析。后期处理。值得注意的是,为了保证高效,准确的计算机线传输,必须精心设计这些计算方法,以在垂直扩展良好的同时最大程度地提高效率(确保随着可用处理单元和内存数量的增加而按比例减少计算时间)以及不断增加的数据量。理想情况下,此类工具对于实验人员而言还需要较低的学习曲线,独立于平台并提供集成的方式来存储和检索与适体相关的信息以及可视化与适体有关的信息。实际上,在过去的十年中,典型的HT-SELEX数据集从每个选择周期10,000至100,000个读取增长了200倍,到目前每轮常规超过2000万个读取的大小。这一发展为大数据革命之前设计的许多完善的算法工具构成了新兴障碍,但这些工具仍在RNA生物信息学分析管道中得到积极使用。突出的例子包括通过序列相似性将序列种类聚类为彼此相关的适体家族,以及阐明在整个选择过程中不断发展的一级和/或二级结构中的共有基序。两种分析任务都是针对小规模数据集的,已经建立了很好的解决方案,但是当采用传统方法执行时,随着数据量的增加,计算很快变得难以处理。更复杂的方法设计用于多核环境中的可扩展HT数据处理,如数据中心云环境通常需要专家知识来建立合理的管道,并且可能依赖于众多潜在的不可移植的第三方软件包,从而增加了长期可维护性的负担。,此外,最终处理后的数据主要输出以纯文本格式或存储在关系数据库中,增加了有效解释结果的难度。最后,尽管无疑具有巨大的实用性,但由云服务(例如Galaxy项目)支持的基于Web的(因此是图形的)解决方案在可视化和与大量数据进行交互方面的灵活性受到限制,因为它们必须遵守所施加的约束为了解决这些问题,我们开发了AptaSuite,AptaSuite是功能齐全的,开源的,独立于平台的软件集合,用于HT-SELEX实验的综合分析。与以前的方法形成鲜明对比的是,AptaSuite各自执行各自的且通常是基本的数据工作流程,它提供了一个统一且健壮的框架来管理与适体相关的数据,并利用该框架以标准化的方式将所需数据提供给使用该方法构建的任何特定算法。软件。 AptaSuite的核心是精心设计的API(应用程序编程接口)和相应的参考实现的集合,这些API便于输入,输出和处理适体数据(例如序列,各个选择周期中的适体计数,结构信息等) )。在这个强大的核心库之上,还有许多以前发布的方法,,,href="#bib15" rid="bib15" class=" bibr popnode"> 15 从头开始实现,现在已合并到这个统一且易于使用的框架中(请参见href =“ / pmc / articles / PMC5992478 / figure / fig1 /” target =“ figure” class =“ fig-table-link figpopup” rid-figpopup =“ fig1” rid-ob =“ ob-fig1” co-legend-rid =“ lgnd_fig1”>图1 )。特别是,所选的方法构成了分析HT-SELEX数据的公认方法,并且专门设计为利用适体的特殊属性和SELEX过程。<!-fig ft0-> <!-fig mode = art f1 -> href="/pmc/articles/PMC5992478/figure/fig1/" target="figure" rid-figpopup="fig1" rid-ob="ob-fig1"> <!-fig / graphic | fig / alternatives / graphic mode =“ anchored” m1-> class =“ inline_block ts_canvas” href =“ / core / lw / 2.0 / html / tileshop_pmc / tileshop_pmc_inline.html?title = Click%20on%20image%20to% 20zoom&p = PMC3&id = 5992478_gr1.jpg“ target =” tileshopwindow“> target="object" href="/pmc/articles/PMC5992478/figure/fig1/?report=objectonly">在单独的窗口中打开 class =“ figpopup” href =“ / pmc / articles / PMC5992478 / figure / fig1 /” target =“ figure” rid-figpopup =“ fig1” rid-ob =“ ob-fig1 “>图1 <!-标题a7-> AptaSuiteDiagram的模块化体系结构描述了AptaSuite的程序结构。可通过定义明确的API访问用于存储,检索和操作适体的核心库,该API再将数据提供给负责适体的输入,处理和输出的算法并从中接受数据。核心库包括有效的解决方案,用于存储有关可接受的适体的一级和二级结构信息,通过存储实验设置进行执行的选择的数字表示,以及有关执行的选择周期和辅助工具的信息,例如二级结构预测算法,已移植到Java以保持平台独立性。该软件层当前具有AptaPLEX,这是一种用于HT-SELEX数据的多线程解复用器。 AptaSIM,旨在逼真的模拟SELEX实验的选择动态; AptaCLUSTER用于有效确定适体家族; AptaMUT,专门用于鉴定具有改善的结合亲和力的突变体; AptaTRACE,一种有效的算法,可利用全部可用的适体池来阐明序列结构的基序。最后,每种计算方法都可以从命令行或通过图形用户界面访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号