首页> 美国卫生研究院文献>Oncology Letters >Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer
【2h】

Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer

机译:肺癌中非编码变体的功能注释和与癌症相关的lncRNA的优先次序

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Multiple computational tools have been widely applied to the detection of coding driver mutations in cancer; however, the prioritization of pathogenic non-coding variants remains a difficult and demanding task. The present study was performed to distinguish non-coding disease-causing mutations from neutral ones, and to prioritize potential cancer-associated long non-coding RNAs (lncRNAs) with a logistic regression model in lung cancer. A logistic regression model was constructed, combining 19,153 disease-associated ClinVar and Human Gene Mutation Database pathogenic variants as the response variable and non-coding features as the predictor variable. Validation of the model was conducted with genome-wide association study (GWAS) disease- or trait-associated single nucleotide polymorphisms (SNPs) and recurrent somatic mutations. High scoring regions were characterized with respect to their distribution in various features and gene classes; potential cancer-associated lncRNA candidates were prioritized, combining the fraction of high-scoring regions and average score predicted by the logistic regression model. H3K79me2 was the most negative factor that contributed to the model, while conserved regions were most positively informative to the model. The area under the receiver operating characteristic curve of the model was 0.89. The model assigned a significantly higher score to GWAS SNPs and recurrent somatic mutations compared with neutral SNPs (mean, 5.9012 vs. 5.5238; P<0.001, Mann-Whitney U test) and non-recurrent mutations (mean, 5.4677 vs. 5.2277, P<0.001, Mann-Whitney U test), respectively. It was observed that regions, including splicing sites and untranslated regions, and gene classes, including cancer genes and cancer-associated lncRNAs, had an increased enrichment of high-scoring regions. In total, 2,679 cancer-associated lncRNAs were determined and characterized. A total of 104 of these lncRNAs were differentially expressed between lung cancer and normal specimens. The logistic regression model is a useful and efficient scoring system to prioritize non-coding pathogenic variants and lncRNAs, and may provide the basis for detecting non-coding driver lncRNAs in lung cancer.
机译:多种计算工具已广泛应用于检测癌症中的编码驱动子突变。但是,对病原性非编码变体进行优先排序仍然是一项艰巨而艰巨的任务。进行本研究的目的是区分非编码致病突变与中性突变,并通过肺癌的逻辑回归模型对潜在的与癌症相关的长非编码RNA(lncRNA)进行优先排序。构建了逻辑回归模型,将19,153种与疾病相关的ClinVar和人类基因突变数据库的病原体变体作为响应变量,将非编码特征作为预测变量相结合。通过全基因组关联研究(GWAS)疾病或与性状相关的单核苷酸多态性(SNP)和复发性体细胞突变进行模型验证。高得分区域的特征在于其在各种特征和基因类别中的分布;优先考虑潜在的与癌症相关的lncRNA候选物,将高得分区域的分数和logistic回归模型预测的平均分数结合起来。 H3K79me2是促成模型的最不利因素,而保守区对模型最有帮助。该模型的接收器工作特性曲线下方的面积为0.89。与中性SNPs(平均值5.9012 vs. 5.5238; P <0.001,Mann-Whitney U检验)和非复发性突变(平均值5.4677 vs. 5.2277,P)相比,该模型为GWAS SNPs和复发性体细胞突变评分更高<0.001,Mann-Whitney U检验)。观察到包括剪接位点和非翻译区的区域以及包括癌症基因和与癌症相关的lncRNA的基因类别具有增加的高得分区域的富集。总共确定并鉴定了2679种与癌症相关的lncRNA。这些lncRNA共有104个在肺癌和正常标本之间差异表达。逻辑回归模型是一种有用且有效的评分系统,用于对非编码致病性变体和lncRNA进行优先级排序,并可为检测肺癌中非编码驱动程序lncRNA提供基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号