...
首页> 外文期刊>Statistica Sinica >ADAPTIVE BASIS SELECTION FOR EXPONENTIAL FAMILY SMOOTHING SPLINES WITH APPLICATION IN JOINT MODELING OF MULTIPLE SEQUENCING SAMPLES
【24h】

ADAPTIVE BASIS SELECTION FOR EXPONENTIAL FAMILY SMOOTHING SPLINES WITH APPLICATION IN JOINT MODELING OF MULTIPLE SEQUENCING SAMPLES

机译:用于指数族平滑样条的自适应基础选择,其应用于多个测序样本的联合建模

获取原文
获取原文并翻译 | 示例
           

摘要

Second-generation sequencing technologies have replaced array-based technologies and become the default method for genomics and epigenomics analysis. Second-generation sequencing technologies sequence tens of millions of DNA/cDNA fragments in parallel. After the resulting sequences (short reads) are mapped to the genome, one gets a sequence of short read counts along the genome. Effective extraction of signals in these short read counts is the key to the success of sequencing technologies. Nonparametric methods, in particular smoothing splines, have been used extensively for modeling and processing single sequencing samples. However, nonparametric joint modeling of multiple second-generation sequencing samples is still lacking due to computational cost. In this article, we develop an adaptive basis selection method for efficient computation of exponential family smoothing splines for modeling multiple second-generation sequencing samples. Our adaptive basis selection gives a sparse approximation of smoothing splines, yielding a lower dimensional effective model space for a more scalable computation. The asymptotic analysis shows that the effective model space is rich enough to retain essential features of the data. Moreover, exponential family smoothing spline models computed via adaptive basis selection are shown to have good statistical properties, e.g., convergence at the same rate as that of full basis exponential family smoothing splines. The empirical performance is demonstrated through simulation studies and two second -generation sequencing data examples.
机译:第二代排序技术已更换基于阵列的技术,并成为基因组学和表观囊瘤分析的默认方法。第二代测序技术序列序列并行数百万的DNA / cDNA片段。在得到的序列(短读取)被映射到基因组之后,沿着基因组获得一系列短读数。这些短读数中的信号有效提取信号是测序技术成功的关键。非参数方法,特别是平滑样条曲线,已广泛用于建模和处理单个测序样本。然而,由于计算成本,仍然缺乏多个第二代序列样本的非参数联合建模。在本文中,我们开发了一种自适应基础选择方法,以便有效计算指数族平滑样条曲线,用于建模多个第二代排序样本。我们的自适应基础选择提供了平滑花键的稀疏近似,产生更低的尺寸有效的模型空间,以获得更可扩展的计算。渐近分析表明,有效的模型空间足以保留数据的基本特征。此外,通过自适应基础选择计算的指数家庭平滑样条模型被示出具有良好的统计特性,例如,与完整基础指数家庭平滑样条相同的速率的收敛性。通过仿真研究和两个第二个变量测序数据示例来证明经验性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号