...
首页> 外文期刊>BMC Genomics >A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
【24h】

A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas

机译:用于分类低度和高度卵巢浆液性癌的数据科学方法

获取原文

摘要

Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical applications. Ovarian serous carcinomas can be classified into two largely mutually exclusive grades, low grade and high grade, based on their histologic features. The grade classification based on the genomics may provide valuable clue on how to best manage these patients in clinic. Based on the study of ovarian serous carcinomas, we explore the methodology of combining CNAs reporting from low-coverage sequencing with machine learning techniques to stratify tumor biospecimens of different grades. We have developed a data-driven methodology for tumor classification using the profiles of CNAs reported by low-coverage sequencing. The proposed method called Bag-of-Segments is used to summarize fixed-length CNA features predictive of tumor grades. These features are further processed by machine learning techniques to obtain classification models. High accuracy is obtained for classifying ovarian serous carcinoma into high and low grades based on leave-one-out cross-validation experiments. The models that are weakly influenced by the sequence coverage and the purity of the sample can also be built, which would be of higher relevance for clinical applications. The patterns captured by Bag-of-Segments features correlate with current clinical knowledge: low grade ovarian tumors being related to aneuploidy events associated to mitotic errors while high grade ovarian tumors are induced by DNA repair gene malfunction. The proposed data-driven method obtains high accuracy with various parametrizations for the ovarian serous carcinoma study, indicating that it has good generalization potential towards other CNA classification problems. This method could be applied to the more difficult task of classifying ovarian serous carcinomas with ambiguous histology or in those with low grade tumor co-existing with high grade tumor. The closer genomic relationship of these tumor samples to low or high grade may provide important clinical value.
机译:拷贝数交替(CNA)定义为DNA区域的体细胞增减。 CNA的轮廓可以提供特定于肿瘤类型或肿瘤等级的指纹。自从成功转化为临床应用以来,用于报道CNA的低覆盖率测序最近引起了人们的兴趣。卵巢浆液性癌根据其组织学特征可分为两个相互排斥的等级,低等级和高等级。基于基因组学的等级分类可能为如何在临床上最佳管理这些患者提供有价值的线索。基于对卵巢浆液性癌的研究,我们探索了将低覆盖率测序的CNA报告与机器学习技术相结合的方法,以对不同级别的肿瘤生物标本进行分层。我们使用低覆盖率测序报告的CNA资料,开发了一种数据驱动的肿瘤分类方法。所提出的称为“分节袋”的方法用于总结可预测肿瘤等级的定长CNA特征。这些特征通过机器学习技术进一步处理以获得分类模型。基于留一法交叉验证实验,将卵巢浆液性癌分为高和低级别,获得了很高的准确性。还可以建立受序列覆盖率和样品纯度影响较小的模型,这对于临床应用具有更高的相关性。分部特征捕捉的模式与当前的临床知识相关:低级卵巢肿瘤与与有丝分裂错误相关的非整倍性事件有关,而高级卵巢肿瘤则由DNA修复基因功能异常引起。所提出的数据驱动方法在卵巢浆液性癌研究中获得了具有各种参数的高精度,这表明它对其他CNA分类问题具有良好的推广潜力。该方法可用于组织学类型不清的卵巢浆液性癌的分类,或低级肿瘤与高级肿瘤并存的卵巢浆液性癌的分类中较困难的任务。这些肿瘤样品与低或高等级的更紧密的基因组关系可能提供重要的临床价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号