Uncovering Machine Learning-Ready Data from Public Clinical Trial Resources: A case-study on normalization across Aggregate Content of ClinicalTrials.gov

机译：从公共临床试验资源中揭开机器学习的数据：临床综合含量的正常化案例研究.gov

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The state of clinical data is a barrier to the development of machine learning models to improve healthcare. Uncontrolled clinical freetext is common in both the patient and clinical trials: the resulting spelling, grammatical errors, phrasing variation, and other resulting variability results in difficult-to-leverage data. As part of our effort to harmonize the Aggregate Analysis of ClinicalTrials.gov (AACT) drop-withdrawal reasons to a controlled vocabulary, we explored two solutions. Elastic's fuzzy matching capability matched entries in the AACT Drop-Withdrawal table to a list of user-specified terms (74.6% coverage). The second approach was a custom pipeline employing NLP preprocessing, Levenshtein Distance (Fuzzy Matching), and semantic similarity mapping using a pre-trained FastText Model (98% coverage). Although manual oversight is still required, the amount of effort to harmonize with a controlled vocabulary is notably reduced. This work enables the rapid harmonization of clinical databases, allowing them to be leveraged for machine learning and analytics.

机译：临床数据的状态是对机器学习模型的发展的障碍，以改善医疗保健。不受控制的临床近近常见于患者和临床试验中常见：由此产生的拼写，语法错误，措辞变化和其他产生的可变性导致难以利用的数据。作为努力协调临床治疗的总分析的一部分.GOV（AACT）辍学原因对受控的词汇，我们探讨了两个解决方案。 Elastic的模糊匹配能力匹配AACT丢弃表中的条目，到用户指定的术语列表（覆盖率74.6％）。第二种方法是使用NLP预处理，Levenshtein距离（模糊匹配）和使用预先培训的FastText模型（98％覆盖率）的语义相似性映射的定制管道。虽然仍然需要手动监督，但明显减少了与受控词汇协调的努力。这项工作能够快速协调临床数据库，使他们能够利用机器学习和分析。

著录项

来源
《IEEE International Conference on Bioinformatics and Biomedicine》|2020年|2965-2967|共3页
会议地点
作者
Emmette R. Hutchison; Youyi Zhang; Sreenath Nampally; Jim Weatherall; Faisal Khan; Khader Shameer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Protocols; Clinical trials; Databases; Vocabulary; Semantics; Research and development; Natural language processing;

机译：协议;临床试验;数据库;词汇;语义;研究和开发;自然语言处理;
入库时间 2022-08-26 13:54:13

相似文献

外文文献
中文文献
专利

1. Does the low prevalence affect the sample size of interventional clinical trials of rare diseases? An analysis of data from the aggregate analysis of clinicaltrials.gov [J] . Siew Wan Hee, Adrian Willis, Catrin Tudur Smith, Orphanet journal of rare diseases . 2017,第1期

机译：低患病率会影响罕见病的介入临床试验的样本量吗？来自Clinicaltrials.gov汇总分析的数据分析
2. Clinical trials in peripheral vascular disease: pipeline and trial designs: an evaluation of the ClinicalTrials.gov database. [J] . Sumeet Subherwal, Manesh R Patel, Karen Chiswell, Circulation: An Official Journal of the American Heart Association . 2014,第20期

机译：周围血管疾病的临床试验：管线和试验设计：ClinicalTrials.gov数据库的评估。
3. From ClinicalTrials.gov trial registry to an analysis-ready database of clinical trial results [J] . CepedaM.S., LobanovV., BerlinJ.A. Clinical trials: journal of the Society for Clinical Trials . 2013,第2期

机译：从ClinicalTrials.gov试验注册表到临床试验结果的分析就绪数据库
4. CHARACTERISTICS OF DRUG COMBINATION THERAPY IN ONCOLOGY BY ANALYZING CLINICAL TRIAL DATA ON CLINICALTRIALS.GOV [C] . MENGHUA WU, MARINA SIROTA, ATUL J.BUTTE, Pacific Symposium on Biocomputing . 2015

机译：通过分析临床试验数据临床试验数据的临床药物组合治疗的特征.GOV
5. Predicting Individual Treatment Effect from Randomized Clinical Trial Data: A Nested Cross-Validation Evaluation Framework for Machine Learning Methods [D] . Liu, Yu. 2021

机译：预测随机临床试验数据的个体治疗效果：用于机器学习方法的嵌套交叉验证评估框架
6. How Frequently Do the Results from Completed US Clinical Trials Enter the Public Domain? - A Statistical Analysis of the ClinicalTrials.gov Database [O] . Hiroki Saito, Christopher J. Gill -1

机译：完成的美国临床试验的结果多久进入公共领域？ -ClinicalTrials.gov数据库的统计分析
7. Does the low prevalence affect the sample size of interventional clinical trials of rare diseases? An analysis of data from the aggregate analysis of clinicaltrials.gov [O] . Siew Wan Hee, Adrian Willis, Catrin Tudur Smith, 2017

机译：低患病率会影响罕见病的介入临床试验的样本量吗？来自Clinicaltrials.gov汇总分析的数据分析

Uncovering Machine Learning-Ready Data from Public Clinical Trial Resources: A case-study on normalization across Aggregate Content of ClinicalTrials.gov

摘要

著录项

相似文献

相关主题

期刊订阅