首页> 美国卫生研究院文献>Nucleic Acids Research >kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets
【2h】

kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

机译:kmer-SVM:用于识别基因组数据集中预测性调控序列特征的Web服务器

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at .
机译:大规模并行测序技术已使基因组数据集的生成成为许多生物学研究的常规组成部分。例如,染色质免疫沉淀后进行序列测定,可检测到被特定因素(直接或间接)结合的基因组区域,而DNase-seq可识别开放染色质的区域。解释这些数据的主要瓶颈是确定潜在的DNA序列代码,该代码定义并最终有助于预测这些转录因子(TF)结合的或开放的染色质区域。我们最近开发了一种新颖的计算方法,该方法使用具有kmer序列特征(kmer-SVM)的支持向量机(SVM)来识别短转录因子结合位点的预测组合,从而确定这些基因组测定的组织特异性(Lee ,Karchin和Beer,《从DNA序列判别哺乳动物增强子》,《基因组研究》,2011年; 21:2167–80)。该调节信息可以(i)通过恢复先前已知的结合位点来增强基因组实验的信心,并且(ii)揭示新的序列特征,用于后续的协同机制实验测试。在这里,我们描述了Web服务器的开发和实现,以允许更广泛的研究社区独立地应用kmer-SVM来分析和解释其基因组数据集。我们分析了五个最近发布的数据集,并演示了该工具如何识别辅助因子和阻抑序列元素。 kmer-SVM可从访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号