首页> 外文期刊>BMC Medical Research Methodology >Machine learning in medicine: a practical introduction to natural language processing
【24h】

Machine learning in medicine: a practical introduction to natural language processing

机译:医学机器学习:自然语言处理的实际介绍

获取原文
           

摘要

Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity. Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM. In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software.
机译:非结构化文本,包括医疗记录,患者反馈和社交媒体评论,可以是临床研究的丰富数据来源。自然语言处理(NLP)描述了一组技术,用于将书面文本的段落转换为可解释的数据集可以通过统计和机器学习(ML)模型来分析。本文的目的是使用自由可用的软件提供对当代技术进行分析的现代技术的实际介绍。我们使用从医学审查网站获得的公共可用数据进行了三个NLP实验。首先,我们对四种药物的开放式患者评论进行了基于词汇的情绪分析:左旋甲肾上腺素,伟哥,奥斯特拉米韦和塞克巴班。接下来,我们使用无人发生的ML(潜在的Dirichlet分配,LDA)来识别数据集中的类似药物,完全基于他们的评论。最后,我们开发了三种监督ML算法,以预测药物审查是否与正或负面评级相关。这些算法是:正则化的逻辑回归,支持向量机(SVM)和人工神经网络(ANN)。我们将这些算法的性能进行了比较了在接收器操作特征曲线(AUC),灵敏度和特异性的接收器下的区域。左旋甲肾上腺素和伟哥被审查了比奥司他韦和甲苯甲烷的良性情绪比例较高。三个LDA集群中的一个明确代表了用于治疗心理健康问题的药物。该群集建议的普通主题是需要数周或数月的药物。另一个集群清楚地代表了用作避孕药的药物。监督机器学习算法预测阳性或负药物评级,分类精度范围为0.664,95%CI [0.608,0.716]的正规化回归为0.720,95%CI [0.664,0.776]。在本文中,我们展示了用于分析大量文本的常见技术的概念概述,并提供可再现的代码,可以随时应用于使用开源软件的其他研究研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号