A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github. com/borisvassilev/lir.
展开▼
机译:现代的生物医学研究项目可以轻松地包含数百个分析步骤,并且分析的可重复性不足已被认为是一个严重的问题。尽管详尽的文档可以实现可重现性,但使用的分析程序数量却是如此之大,以至于实际上很难轻易实现可重现性。文字编程是一种向人类读者展示计算机程序的方法。重新排列代码以遵循程序的逻辑,并以自然语言解释该逻辑。从识字的源代码中提取由计算机执行的代码。因此,有素的编程是将生物医学研究中的分析步骤系统化的理想形式。我们已经开发了可重现的计算工具Lir(识字,可重现的计算),该工具允许使用与工具无关的方法来进行生物医学数据分析。我们通过将Lir应用于案例研究来证明其实用性。我们的目的是研究内体运输调节剂对乳腺癌进展的作用。在此分析中,组合了多种工具来解释可用数据:关系数据库,标准命令行工具和统计计算环境。分析显示,与脂质运输相关的基因LAPTM4B和NDRG1在乳腺癌患者中被共扩增,并鉴定出在乳腺癌进展中可能与LAPTM4B协同作用的基因。我们的案例研究表明,使用Lir,可以在同一数据分析中组合多种工具,以提高效率,可重复性和易于理解。 Lir是可从github获得的开源软件。 com / borisvassilev / lir。
展开▼