...
首页> 外文期刊>BMC Bioinformatics >CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
【24h】

CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

机译:CaPSID:用于在人类基因组和转录组中计算病原体序列的生物信息学平台

获取原文
           

摘要

Background It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. Results Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. Conclusions To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro.
机译:背景技术现已确定,将近20%的人类癌症是由传染原引起的,人类致癌病原体的清单将来会针对各种癌症类型而增长。下一代测序技术对整个肿瘤转录组和基因组进行测序为人类组织中的病原体检测和发现提供了无与伦比的机会,但需要开发新的全基因组生物信息学工具。结果在这里,我们介绍了CaPSID(计算病原体序列识别),这是一个全面的生物信息学平台,用于识别,查询和可视化肿瘤基因组和转录组中的外源和内源性病原体核苷酸序列。 CaPSID包括用于数据存储的可扩展的高性能数据库和集成了基因组浏览器JBrowse的Web应用程序。 CaPSID还为预先对齐的BAM文件的序列分析(例如基因和基因组覆盖率)提供了有用的指标,并经过优化以在内存使用率低的多处理器计算机上高效运行。结论为了证明CaPSID的有用性和有效性,我们对来自卵巢癌的模拟数据集和转录组样本进行了全面分析。 CaPSID可以正确识别模拟数据集中的所有人类和病原体序列,而在卵巢数据集中,CaPSID的预测已在体外成功验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号