首页> 外文学位 >Data mining and analysis of lung cancer data.
【24h】

Data mining and analysis of lung cancer data.

机译:肺癌数据的数据挖掘和分析。

获取原文
获取原文并翻译 | 示例

摘要

Lung cancer is the leading cause of cancer death in the United States and the world, with more than 1.3 million deaths worldwide per year. However, because of a lack of effective tools to diagnose Lung Cancer, more than half of all cases are diagnosed at an advanced stage, when surgical resection is unlikely to be feasible. The main purpose of this study is to examine the relationship between patient outcomes and conditions of the patients undergoing different treatments for lung cancer and to develop models to predict the mortality of lung cancer. This study will identify the demographic, finance, and clinical factors related to the diagnosis or mortality of Lung Cancer to help physicians and patients in their decision-making.;Two primary data sets have been used in this study, the Nationwide Inpatient Sample (NIS) and the Thomson MedStat MarketScan data. Kernel density estimation was used for NIS to examine the relationship between Age, Length of stay, Diagnosis Categories, Total Cost and Lung Cancer by visualization. The Kaplan-Meier method and Cox proportional hazard model are used for the Medstat data to discover the relationship between the factors and the target variable for more detail. Time series and predictive modeling are used to predict the total cost for hospital decision making, the mortality of Lung cancer based on the historical data and to generate rules to identify the diagnosis of Lung cancer.;Older patients are more likely to have lung cancers that would lead to a higher probability of longer stay and higher costs for the treatment. Within 7 defined clusters of diagnosis for Lung Cancer, the malignant neoplasm of lobe, bronchus or lung is under higher risk. Age, length of stay, admit type, clusters of diagnosis, and clusters of treatment procedures and Major Diagnostic Categories (MDC) were identified as significant factors for the mortality of lung cancer.;We combined Text Miner and Cluster analysis to identify the claim data for Lung Cancer and to determine the category of diagnosis, treatment procedures and medication treatments for those patients. Moreover, the claims data were used to define severity level and treatment categories. Compared with using diagnosis codes directly, the combination of text mining and cluster analysis is more efficient and captures more useful information for further analysis. In order to analyze the mortality of Lung Cancer, we also found that survival analysis is appropriate to preprocess the data for the relationship between a predictor variable of interest and the time of an event. The proportional hazard model examined the effects of different treatment clusters using a hazard ratio and the proportional effect of a treatment cluster (treatment procedure or medication treatment) may vary with time. A decision tree was built to generate rules for identifying high risk lung cancer cases among the regular inpatient population.
机译:在美国和世界范围内,肺癌是导致癌症死亡的主要原因,全世界每年有130万以上的死亡。但是,由于缺乏有效的诊断肺癌的工具,因此一半以上的病例是在晚期阶段被诊断出来的,当时手术切除不太可能。这项研究的主要目的是检查患者结局与接受不同肺癌治疗的患者状况之间的关系,并开发模型来预测肺癌的死亡率。这项研究将确定与肺癌的诊断或死亡率有关的人口统计学,财务状况和临床因素,以帮助医生和患者做出决策。;本研究使用了两个主要数据集:全国住院患者样本(NIS) )和Thomson MedStat MarketScan数据。内核密度估计用于NIS,以通过可视化检查年龄,住院时间,诊断类别,总费用和肺癌之间的关系。将Kaplan-Meier方法和Cox比例风险模型用于Medstat数据,以更详细地发现因素与目标变量之间的关系。时间序列和预测模型用于根据历史数据预测医院决策的总成本,肺癌的死亡率,并生成确定肺癌诊断的规则。年龄较大的患者更容易患肺癌,会导致更长的住院时间和更高的治疗费用。在7个确定的肺癌诊断簇中,肺叶,支气管或肺部恶性肿瘤的风险更高。年龄,住院时间,入院类型,诊断类别,治疗程序类别和主要诊断类别(MDC)被​​确定为肺癌死亡率的重要因素。;我们结合了Text Miner和聚类分析来识别索赔数据并确定这些患者的诊断类别,治疗程序和药物治疗。此外,索赔数据用于定义严重性级别和治疗类别。与直接使用诊断代码相比,文本挖掘和聚类分析相结合的效率更高,并且可以捕获更多有用的信息以进行进一步分析。为了分析肺癌的死亡率,我们还发现生存分析适合于预处理感兴趣的预测变量与事件发生时间之间的关系的数据。比例风险模型使用风险比检查了不同治疗组的效果,治疗组的比例效果(治疗程序或药物治疗)可能随时间而变化。建立决策树以生成规则,以在常规住院患者中识别高危肺癌病例。

著录项

  • 作者

    Tang, Guoxin.;

  • 作者单位

    University of Louisville.;

  • 授予单位 University of Louisville.;
  • 学科 Applied Mathematics.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 150 p.
  • 总页数 150
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号