...
首页> 外文期刊>Artificial intelligence in medicine >Data mining techniques for cancer detection using serum proteomic profiling
【24h】

Data mining techniques for cancer detection using serum proteomic profiling

机译:使用血清蛋白质组学分析进行癌症检测的数据挖掘技术

获取原文
获取原文并翻译 | 示例

摘要

Objective: Pathological changes in an organ or tissue may be reflected in proteomic patterns in serum. It is possible that unique serum proteomic patterns could be used to discriminate cancer samples from non-cancer ones. Due to the complexity of proteomic profiling, a higher order analysis such as data mining is needed to uncover the differences in complex proteomic patterns. The objectives of this paper are (1) to briefly review the application of data mining techniques in proteomics for cancer detection/diagnosis; (2) to explore a novel analytic method with different feature selection methods; (3) to compare the results obtained on different datasets and that reported by Petricoin et al. in terms of detection performance and selected proteomic patterns. Methods and material: Three serum SELDI MS data sets were used in this research to identify serum proteomic patterns that distinguish the serum of ovarian cancer cases from non-cancer controls. A support vector machine-based method is applied in this study, in which statistical testing and genetic algorithm-based methods are used for feature selection respectively. Leave-one-out cross validation with receiver operating characteristic (ROC) curve is used for evaluation and comparison of cancer detection performance. Results and conclusions: The results showed that (1) data mining techniques can be successfully applied to ovarian cancer detection with a reasonably high performance; (2) the classification using features selected by the genetic algorithm consistently outperformed those selected by statistical testing in terms of accuracy and robustness; (3) the discriminatory features (proteomic patterns) can be very different from one selection method to another. In other words, the pattern selection and its classification efficiency are highly classifier dependent. Therefore, when using data mining techniques, the discrimination of cancer from normal does not depend solely upon the identity and origination of cancer-related proteins.
机译:目的:器官或组织的病理变化可能反映在血清中的蛋白质组学模式中。独特的血清蛋白质组学模式可用于区分癌症样品和非癌症样品。由于蛋白质组分析的复杂性,需要进行更高级别的分析(例如数据挖掘)来揭示复杂蛋白质组模式的差异。本文的目的是(1)简要回顾数据挖掘技术在蛋白质组学中用于癌症检测/诊断的应用; (2)探索一种具有不同特征选择方法的新颖分析方法; (3)比较在不同数据集上获得的结果和Petricoin等人报道的结果。在检测性能和蛋白质组学模式方面。方法和材料:本研究使用三个血清SELDI MS数据集来鉴定血清蛋白质组学模式,以区分卵巢癌病例的血清和非癌症对照的血清。本研究采用一种基于支持向量机的方法,其中统计测试和基于遗传算法的方法分别用于特征选择。带有接收者工作特征(ROC)曲线的留一法交叉验证可用于评估和比较癌症检测性能。结果与结论:(1)数据挖掘技术可以成功地以较高的性能应用于卵巢癌的检测; (2)在准确性和鲁棒性方面,使用遗传算法选择的特征进行分类的性能始终优于通过统计测试选择的特征; (3)区别特征(蛋白质组模式)在一种选择方法与另一种选择方法之间可能有很大差异。换句话说,模式选择及其分类效率在很大程度上取决于分类器。因此,在使用数据挖掘技术时,癌症与正常人的区别不仅仅取决于癌症相关蛋白的身份和来源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号