Data Quality Improvement of a Multicenter Clinical Trial Dataset

机译：多中心临床试验数据集的数据质量改进

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Medical datasets are usually affected by several problems, such as missing values, inconsistencies, redundancies, that can influence the data mining process and the extraction of useful knowledge. For these reasons, a preprocessing phase should be performed for improving the overall quality of data and, consequently, of the information that may be discovered from them. In this study we applied five steps of data preprocessing to improve the quality of a large dataset derived from a multicenter clinical trial. Our dataset included 298 patients enrolled in a prospective, multicenter, clinical trial, characterized by 22 input variables and one class variable (MIPI value). In particular, data coming from different medical centers were firstly integrated to obtain a homogeneous dataset. The latter was normalized to scale all variables into smaller and similar intervals. Then, all missing values were estimated by means of an imputation step. The complete dataset was finally discretized and reduced to remove redundant variables and decrease the amount of data to be managed. The improvement of data quality after each step was evaluated by means of the patients' classification accuracy using the KNN classifier. Our results showed that the proposed pipeline produced an increment of more than 20% of the classification performances. Moreover, the highest growth of accuracy was obtained after missing value imputation, whereas the discretization and feature selection steps allowed for a significant reduction of variables to be managed, without any deterioration of the information contained in data.

机译：医疗数据集通常受几个问题的影响，例如缺失值，不一致，冗余，可以影响数据挖掘过程和有用知识的提取。由于这些原因，应执行预处理阶段以提高数据的整体质量，并且因此，可以从它们中发现的信息。在这项研究中，我们应用了五个步骤的数据预处理，以提高来自多中心临床试验的大型数据集的质量。我们的数据集包括注册前瞻性，多中心，临床试验的298名患者，其特征在于22个输入变量和一个类变量（MIPI值）。特别是，首先集成来自不同医疗中心的数据以获得均匀的数据集。后者被标准化以将所有变量放入更小和相似的间隔。然后，借助于归一步骤估计所有缺失值。完整的数据集最终是离散化的，并减少以删除冗余变量并减少要管理的数据量。通过使用KNN分类器的患者的分类精度评估每个步骤后的数据质量的提高。我们的研究结果表明，该拟议的管道增量超过分类表演的20％。此外，在缺失值归档之后获得了最高的精度的增长，而允许的离散化和特征选择步骤进行显着减少要管理的变量，而不会对数据中包含的信息的劣化进行任何恶化。

著录项

来源
《Annual International Conference of the IEEE Engineering in Medicine and Biology Society》|2017年|1070-1667p|共4页
会议地点
作者
Gian Maria Zaccaria; Samanta Rosati; Cristina Castagneri; Simone Ferrero; Marco Ladetto; Mario Boccadoro; Gabriella Balestra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q81-53;
关键词

相似文献

外文文献
中文文献
专利

1. Finding and using routine clinical datasets for observational research and quality improvement [J] . Lucy McDonnell, Brendan C Delaney, Frank Sullivan The British journal of general practice: the journal of the Royal College of General Practitioners . 2018,第668期

机译：查找和使用常规临床数据集进行观察研究和质量改善
2. Finding and using routine clinical datasets for observational research and quality improvement [J] . Lucy McDonnell, Brendan C Delaney, Frank Sullivan The British journal of general practice: the journal of the Royal College of General Practitioners . 2018,第668期

机译：查找和使用常规临床数据集进行观察研究和质量改善
3. Finding and using routine clinical datasets for observational research and quality improvement [J] . McDonnell Lucy, Delaney Brendan C., Sullivan Frank The British journal of general practice: the journal of the Royal College of General Practitioners . 2018,第668期

机译：寻找和使用常规临床数据集进行观察研究和质量改进
4. Data Quality Improvement of a Multicenter Clinical Trial Dataset [C] . Gian Maria Zaccaria, Samanta Rosati, Cristina Castagneri, Annual International Conference of the IEEE Engineering in Medicine and Biology Society . 2017

机译：多中心临床试验数据集的数据质量改进
5. Comparison of Performance of Two Clinical Scales to Assess the Post-Thrombotic Syndrome: Secondary Analysis of a Multicenter Randomized Trial of Pharmacomechanical Catheter-Directed Thrombolysis for Deep Vein Thrombosis [D] . Lee, Angela Young-Ju. 2020

机译：两种临床尺度的性能比较评估后血栓形成后综合征：对深静脉血栓形成的药物机械导管导向溶栓的多中心随机试验的二次分析
6. Finding and using routine clinical datasets for observational research and quality improvement [O] . Lucy McDonnell, Brendan C Delaney, Frank Sullivan 2018

机译：查找和使用常规临床数据集进行观察研究和质量改善
7. Data Quality Improvement of a Multicenter Clinical Trial Dataset [O] . Zaccaria Gian Maria, Rosati Samanta, Castagneri Cristina, 2017

机译：多中心临床试验数据集的数据质量改进
8. Coordinating Center Models Project: A Study of Coordinating Centers in Multicenter Clinical Trials. VI. Phases of a Multicenter Clinical Trial [R] . 1979

机译：协调中心模型项目：多中心临床试验中的协调中心研究。 VI。多中心临床试验的阶段

Data Quality Improvement of a Multicenter Clinical Trial Dataset

摘要

著录项

相似文献

相关主题

期刊订阅