...
首页> 外文期刊>Journal of the American Medical Informatics Association : >Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data
【24h】

Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data

机译:使用标记的,未标记的和伪标记的患者数据进行的乳腺癌生存率预测

获取原文
获取原文并翻译 | 示例
           

摘要

Background: Prognostic studies of breast cancer survivability have been aided by machine learning algorithms, which can predict the survival of a particular patient based on historical patient data. However, it is not easy to collect labeled patient records. It takes at least 5 years to label a patient record as 'survived' or 'not survived'. Unguided trials of numerous types of oncology therapies are also very expensive. Confidentiality agreements with doctors and patients are also required to obtain labeled patient records. Proposed method: These difficulties in the collection of labeled patient data have led researchers to consider semi-supervised learning (SSL), a recent machine learning algorithm, because it is also capable of utilizing unlabeled patient data, which is relatively easier to collect. Therefore, it is regarded as an algorithm that could circumvent the known difficulties. However, the fact is yet valid even on SSL that more labeled data lead to better prediction. To compensate for the lack of labeled patient data, we may consider the concept of tagging virtual labels to unlabeled patient data, that is, 'pseudo-labels,' and treating them as if they were labeled. Results: Our proposed algorithm, 'SSL Co-training', implements this concept based on SSL. SSL Co-training was tested using the surveillance, epidemiology, and end results database for breast cancer and it delivered a mean accuracy of 76% and a mean area under the curve of 0.81.
机译:背景:机器学习算法辅助了乳腺癌生存能力的预后研究,该算法可以根据患者的历史数据预测特定患者的生存率。但是,收集带有标签的患者记录并不容易。将患者记录标记为“存活”或“未存活”至少需要5年。多种肿瘤疗法的无指导试验也非常昂贵。还需要与医生和患者达成保密协议,以获得带有标签的患者记录。提议的方法:收集标记的患者数据中的这些困难已导致研究人员考虑使用半监督学习(SSL)(一种最新的机器学习算法),因为它也能够利用相对较容易收集的未标记的患者数据。因此,它被视为可以规避已知困难的算法。但是,即使在SSL上,更多标记的数据可以带来更好的预测这一事实仍然有效。为了弥补缺少标签的患者数据的不足,我们可以考虑将虚拟标签标记为未标签的患者数据(即“伪标签”),并像对待标签一样对待它们的概念。结果:我们提出的算法“ SSL协同训练”基于SSL实现了这一概念。 SSL联合培训使用乳腺癌的监测,流行病学和最终结果数据库进行了测试,其平均准确性为76%,曲线下的平均面积为0.81。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号