首页> 外文期刊>Bioinformatics >Estimating optimal window size for analysis of low-coverage next-generation sequence data
【24h】

Estimating optimal window size for analysis of low-coverage next-generation sequence data

机译:估算最佳窗口大小以分析低覆盖率的下一代序列数据

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing (50.1 ), performing 'binning' or 'windowing' on mapped short sequences ('reads') is critical to extract genomic information of interest for further evaluation, such as copy-number alteration analysis. If the window size is too small, many windows will exhibit zero counts and almost no pattern can be observed. In contrast, if the window size is too wide, the patterns or genomic features will be 'smoothed out'. Our objective is to identify an optimal window size in between the two extremes. Results: We assume the reads density to be a step function. Given this model, we propose a data-based estimation of optimal window size based on Akaike's information criterion (AIC) and cross-validation (CV) log-likelihood. By plotting the AIC and CV log-likelihood curve as a function of window size, we are able to estimate the optimal window size that minimizes AIC or maximizes CV log-likelihood. The proposed methods are of general purpose and we illustrate their application using low-coverage next-generation sequence datasets from real tumour samples and simulated datasets. Availability and implementation: An R package to estimate optimal window size is available at http://www1. maths. leeds. ac. uk/* arief/R/ win/
机译:动机:当前的高通量测序已极大地改变了基因组序列分析。在低覆盖率测序(50.1)的背景下,对映射的短序列(“读”)执行“合并”或“窗口化”对于提取感兴趣的基因组信息以进行进一步评估(例如拷贝数变化分析)至关重要。如果窗口大小太小,许多窗口将显示零计数,几乎看不到任何模式。相反,如果窗口尺寸太大,则模式或基因组特征将被“平滑化”。我们的目标是确定两个极端之间的最佳窗口大小。结果:我们假设读取密度是一个阶跃函数。在此模型的基础上,我们提出了基于Akaike信息准则(AIC)和交叉验证(CV)对数可能性的基于数据的最佳窗口大小估计。通过将AIC和CV对数似然曲线绘制成窗口大小的函数,我们能够估算出使AIC最小化或CV对数似然性最大化的最佳窗口大小。所提出的方法具有通用性,我们使用来自真实肿瘤样品和模拟数据集的低覆盖率下一代序列数据集来说明其应用。可用性和实现:可从http:// www1获得一个R包来估计最佳窗口大小。数学。利兹。交流英国/ * arief / R /胜利/

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号