Data mining and analysis of lung cancer data.

机译：肺癌数据的数据挖掘和分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lung cancer is the leading cause of cancer death in the United States and the world, with more than 1.3 million deaths worldwide per year. However, because of a lack of effective tools to diagnose Lung Cancer, more than half of all cases are diagnosed at an advanced stage, when surgical resection is unlikely to be feasible. The main purpose of this study is to examine the relationship between patient outcomes and conditions of the patients undergoing different treatments for lung cancer and to develop models to predict the mortality of lung cancer. This study will identify the demographic, finance, and clinical factors related to the diagnosis or mortality of Lung Cancer to help physicians and patients in their decision-making.;Two primary data sets have been used in this study, the Nationwide Inpatient Sample (NIS) and the Thomson MedStat MarketScan data. Kernel density estimation was used for NIS to examine the relationship between Age, Length of stay, Diagnosis Categories, Total Cost and Lung Cancer by visualization. The Kaplan-Meier method and Cox proportional hazard model are used for the Medstat data to discover the relationship between the factors and the target variable for more detail. Time series and predictive modeling are used to predict the total cost for hospital decision making, the mortality of Lung cancer based on the historical data and to generate rules to identify the diagnosis of Lung cancer.;Older patients are more likely to have lung cancers that would lead to a higher probability of longer stay and higher costs for the treatment. Within 7 defined clusters of diagnosis for Lung Cancer, the malignant neoplasm of lobe, bronchus or lung is under higher risk. Age, length of stay, admit type, clusters of diagnosis, and clusters of treatment procedures and Major Diagnostic Categories (MDC) were identified as significant factors for the mortality of lung cancer.;We combined Text Miner and Cluster analysis to identify the claim data for Lung Cancer and to determine the category of diagnosis, treatment procedures and medication treatments for those patients. Moreover, the claims data were used to define severity level and treatment categories. Compared with using diagnosis codes directly, the combination of text mining and cluster analysis is more efficient and captures more useful information for further analysis. In order to analyze the mortality of Lung Cancer, we also found that survival analysis is appropriate to preprocess the data for the relationship between a predictor variable of interest and the time of an event. The proportional hazard model examined the effects of different treatment clusters using a hazard ratio and the proportional effect of a treatment cluster (treatment procedure or medication treatment) may vary with time. A decision tree was built to generate rules for identifying high risk lung cancer cases among the regular inpatient population.

机译：在美国和世界范围内，肺癌是导致癌症死亡的主要原因，全世界每年有130万以上的死亡。但是，由于缺乏有效的诊断肺癌的工具，因此一半以上的病例是在晚期阶段被诊断出来的，当时手术切除不太可能。这项研究的主要目的是检查患者结局与接受不同肺癌治疗的患者状况之间的关系，并开发模型来预测肺癌的死亡率。这项研究将确定与肺癌的诊断或死亡率有关的人口统计学，财务状况和临床因素，以帮助医生和患者做出决策。;本研究使用了两个主要数据集：全国住院患者样本（NIS））和Thomson MedStat MarketScan数据。内核密度估计用于NIS，以通过可视化检查年龄，住院时间，诊断类别，总费用和肺癌之间的关系。将Kaplan-Meier方法和Cox比例风险模型用于Medstat数据，以更详细地发现因素与目标变量之间的关系。时间序列和预测模型用于根据历史数据预测医院决策的总成本，肺癌的死亡率，并生成确定肺癌诊断的规则。年龄较大的患者更容易患肺癌，会导致更长的住院时间和更高的治疗费用。在7个确定的肺癌诊断簇中，肺叶，支气管或肺部恶性肿瘤的风险更高。年龄，住院时间，入院类型，诊断类别，治疗程序类别和主要诊断类别（MDC）被确定为肺癌死亡率的重要因素。；我们结合了Text Miner和聚类分析来识别索赔数据并确定这些患者的诊断类别，治疗程序和药物治疗。此外，索赔数据用于定义严重性级别和治疗类别。与直接使用诊断代码相比，文本挖掘和聚类分析相结合的效率更高，并且可以捕获更多有用的信息以进行进一步分析。为了分析肺癌的死亡率，我们还发现生存分析适合于预处理感兴趣的预测变量与事件发生时间之间的关系的数据。比例风险模型使用风险比检查了不同治疗组的效果，治疗组的比例效果（治疗程序或药物治疗）可能随时间而变化。建立决策树以生成规则，以在常规住院患者中识别高危肺癌病例。

著录项

作者
Tang, Guoxin.;
展开▼
作者单位

University of Louisville.;

展开▼
授予单位 University of Louisville.;
学科 Applied Mathematics.;Biology Bioinformatics.
学位 Ph.D.
年度 2010
页码 150 p.
总页数 150
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. 21LBA Assessing the value of preoperative chemotherapy in early-stage non-small cell lung cancer: mature data and prognostic factors analysis of a Phase III randomized trial of surgery alone vs preoperative Paclitaxel/Carboplatin (PC) vs postoperative PC. Final NATCH data. A Spanish Lung Cancer Group Trial [J] . B.Massuti, J.M.Sanchez, G.Alonso, European Journal of Cancer Supplements . 2009 ,第3期

机译：21LBA评估术前早期非小细胞肺癌化疗的价值：一项单独的手术，术前紫杉醇/卡铂（PC）与术后PC的III期随机试验的成熟数据和预后因素分析。最终的NATCH数据。西班牙肺癌小组审判
2. 21LBA Assessing the value of preoperative chemotherapy in early-stage non-small cell lung cancer: mature data and prognostic factors analysis of a Phase III randomized trial of surgery alone vs preoperative Paclitaxel/Carboplatin (PC) vs postoperative PC. Final NATCH data. A Spanish Lung Cancer Group Trial [J] . B.Massuti, J.M.Sanchez, G.Alonso, European Journal of Cancer Supplements . 2009 ,第3期

机译：21LBA评估术前早期非小细胞肺癌化疗的价值：一项单独的手术，术前紫杉醇/卡铂（PC）与术后PC的III期随机试验的成熟数据和预后因素分析。最终的NATCH数据。西班牙肺癌小组审判
3. Access to care and stage at diagnosis for patients with lung cancer and esophageal cancer: analysis of the Savannah River Region Information System cancer registry data. [J] . Silverstein MD, Nietert PJ, Ye X, Southern Medical Journal . 2002 ,第8期

机译：肺癌和食道癌患者的就诊和诊断阶段：对萨凡纳河地区信息系统癌症注册数据的分析。
4. TDDA, a Data Mining Tool for Text Databases: A Case History in a Lung Cancer Text Database [C] . Jeffrey A. Goldman, Wesley Chu, D. Stott Parker, Discovery science . 1998

机译：TDDA，用于文本数据库的数据挖掘工具：肺癌文本数据库中的案例历史记录
5. Industrial Applications of Data Mining Engineering Effort Forecasting based on Mining and Analysis of Patterns in Historical Project Execution Data. [D] . Bhattacharya, Indrani. 2013

机译：基于历史项目执行数据的挖掘和模式分析的数据挖掘工程工作量预测的工业应用。
6. A Data Mining-Based Analysis of Core Herbs on Different Patterns (Zheng) of Non-Small Cell Lung Cancer [O] . Xiangjun Qi, Zehuai Guo, Qianying Chen, 2021

机译：非小细胞肺癌不同图案（Zheng）核心草药的数据挖掘分析
7. Prioritizing therapeutics for lung cancer: an integrative meta-analysis of cancer gene signatures and chemogenomic data. [O] . Kristen Fortney, Joshua Griesman, Max Kotlyar, 2015

机译：优先考虑肺癌治疗：癌症基因特征和化学基因组学数据的综合荟萃分析。

Data mining and analysis of lung cancer data.

摘要

著录项

相似文献

相关主题

期刊订阅