首页> 外文期刊>Oncology letters >Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer
【24h】

Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer

机译:肺癌中非编码变异的功能注释和与癌症相关的lncRNA的优先次序

获取原文
获取原文并翻译 | 示例
           

摘要

Multiple computational tools have been widely applied to the detection of coding driver mutations in cancer; however, the prioritization of pathogenic non-coding variants remains a difficult and demanding task. The present study was performed to distinguish non-coding disease-causing mutations from neutral ones, and to prioritize potential cancer-associated long non-coding RNAs (incRNAs) with a logistic regression model in lung cancer .A logistic regression model was constructed, combining 19,153 disease associated ClinVar and Human Gene Mutation Database pathogenic variants as the response variable and non-coding features as the predictor variable. Validation of the model VMS conducted with genome-wide association study (GWAS) disease- or trait-associated single nucleotide polymorphisms (SNPs) and recurrent somatic mutations. High scoring regions were characterized with respect to their distribution in various features and gene classes; potential cancer-associated IncRNA candidates were prioritized, combining the fraction of high-scoring regions and average score predicted by the logistic regression model. H3K79me2 VMS the most negative factor that contributed to the model, while conserved regions were most positively informative to the model. The area under the receiver operating characteristic curve of the model was 0.89. The model assigned a significantly higher score to GWAS SNPs and recurrent somatic mutations compared with neutral SNPs (mean, 5.9012 vs. 5.5238; P<0.001I, Mann-Whitney U test) and non-recurrent mutations (mean, 5.4677 vs. 5.2277, P<0.001., Mann-Whitney U test), respectively. It was observed that regions, including splicing sites and untranslated regions, and gene classes, including cancer genes and cancer associated IncRNAs, had an increased enrichment of high-scoring regions. In total, 2,679 cancer-associated incRN.As were determined and characterized. A total of 104 of these IncRNAs were differentially expressed between lung cancer and normal specimens. The logistic regression model is a useful and efficient scoring system to prioritize non-coding pathogenic variants and IncRNAs, and may provide the basis for detecting non-coding driver incRNAs in lung cancer.
机译:多种计算工具已广泛应用于检测癌症中编码驱动基因的突变。但是,对病原性非编码变体进行优先排序仍然是一项艰巨而艰巨的任务。进行本研究以区分非编码致病突变与中性突变,并使用Logistic回归模型对潜在的与癌症相关的长非编码RNA(incRNA)进行优先级排序。 19,153种与疾病相关的ClinVar和人类基因突变数据库的病原体变异作为响应变量,而非编码特征作为预测变量。使用全基因组关联研究(GWAS)疾病或与性状相关的单核苷酸多态性(SNP)和复发性体细胞突变进行的模型VMS验证。高得分区域的特征在于其在各种特征和基因类别中的分布;优先考虑潜在的与癌症相关的IncRNA候选者,将高得分区域的比例和logistic回归模型预测的平均得分结合起来。 H3K79me2 VMS是造成该模型的最不利因素,而保守区对该模型最有帮助。该模型的接收器工作特性曲线下的面积为0.89。与中性SNPs(平均值5.9012 vs. 5.5238; P <0.001I,Mann-Whitney U检验)和非复发性突变(平均值5.4677 vs. 5.2277)相比,该模型对GWAS SNPs和复发性体细胞突变的评分明显更高。 P <0.001。,Mann-Whitney U检验)。观察到,包括剪接位点和非翻译区在内的区域以及包括癌症基因和与癌症相关的IncRNA在内的基因类别,对高分区域的富集度增加了。总共确定和表征了2679个与癌症相关的incRN.As。在肺癌和正常标本之间总共有104种这些IncRNA差异表达。逻辑回归模型是一种有用且有效的评分系统,用于对非编码致病性变体和IncRNA进行优先级排序,并可为检测肺癌中非编码驱动程序incRNA提供基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号