Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

Haijun Zhai; Todd Lingren; Louise Deleger; Qi Li; Megan Kaiser; Laura Stoutenborough; Imre Solti

首页> 外文期刊>Journal of medical Internet research >Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

【24h】

Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

机译：基于Web 2.0的众包可在临床自然语言处理中开发高质量的金标准

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora.Objective: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora.Methods: To build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations.Results: The agreement between the crowd’s annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task.Conclusions: This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower’s quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches.

机译：背景：高质量的金标准对于基于监督的基于机器学习的临床自然语言处理（NLP）系统至关重要。在临床NLP项目中，专家注释者通常会创建黄金标准。但是，传统的注释既昂贵又费时。为了减少注释的成本，常规的NLP项目已转向基于Web 2.0技术的众包，其中涉及将较小的子任务提交给Internet上协调的工人市场。在众包领域已经进行了许多研究，但只有少数研究集中在一般NLP领域的任务上，而在生物医学领域则很少，通常基于非常小的试点样本。此外，与传统制定的黄金标准相比，众包生物医学NLP语料库的质量从来都不例外。先前报道的关于医学命名实体注释任务的结果表明，众包和传统开发的语料库之间基于0.68 F度量的协议。目的：基于一般众包研究的先前工作，本研究调查了众包在临床NLP领域的可用性方法：为了建立评估众包工作者绩效的金标准，从ClinicalTrials.gov网站上随机选择了1042个临床试验公告（CTA），并对其进行了双注药物名称，药物类型和链接的属性。对于实验，我们使用了基于Amazon Mechanical Turk的众包平台CrowdFlower。我们计算了敏感度，精确度和F量度以评估人群的工作质量，并测试了统计显着性（P <.001，卡方检验），以检测人群来源的注解和传统开发的注解之间的差异。人群的注释和传统生成的语料库之间的差异很大：（1）注释（0.87，药物名称的F度量; 0.73，药物类型），（2）先前注释的校正（0.90，药物名称; 0.76，药物）类型），并且非常适合（3）将药物与其属性（0.96）联系起来。简单的投票提供了最佳的判断汇总方法。人群和传统生成的语料库之间没有统计学上的显着差异。我们的结果显示，与先前报道的名为“实体注释任务”的药物相比，结果改善了27.9％。结论：本研究提供了三点贡献。首先，我们证明了众包是一种收集临床文本高质量注释的可行，廉价，快速且实用的方法（当排除受保护的健康信息时）。我们认为，精心设计的用户界面和严格的实体注释和链接质量控制策略对于这项工作的成功至关重要。其次，作为对基于Internet的众包领域的进一步贡献，我们将公开发布JavaScript和CrowdFlower标记语言基础结构代码，这对于利用CrowdFlower的质量控制和众包接口来命名实体注释是必不可少的。最后，为了刺激未来的研究，我们将发布由传统和众包方法生成的CTA注释。

著录项

来源
《Journal of medical Internet research》 |2013年第4期|共17页
作者
Haijun Zhai; Todd Lingren; Louise Deleger; Qi Li; Megan Kaiser; Laura Stoutenborough; Imre Solti;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. [J] . Todd Lingren, Louise Deleger, Katalin Molnar, Journal of the American Medical Informatics Association : . 2014,第3期

机译：评估预批注对批注速度和潜在偏见的影响：在临床试验公告中为临床命名实体识别开发自然语言处理黄金标准。
2. Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies [J] . Majid Afshar, Dmitriy Dligach, Brihat Sharma, Journal of the American Medical Informatics Association : . 2019,第11期

机译：高吞吐量自然语言处理架构的开发和应用，将临床数据仓库中的所有临床文献转换为标准化医疗词汇表
3. Development and web deployment of an automated neuroradiology MRI protocoling tool with natural language processing [J] . Chillakuru Yeshwant Reddy, Munjal Shourya, Laguna Benjamin, BMC Medical Informatics and Decision Making . 2021,第1期

机译：具有自然语言处理的自动神经产物MRI协议工具的开发和网络部署
4. Cheap, Fast, and Good Enough for the Non-biomedical Domain but is It Usable for Clinical Natural Language Processing? Evaluating Crowdsourcing for Clinical Trial Announcement Named Entity Annotations [C] . Zhai Haijun, Lingren Todd, Deleger Louise, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology. . 2012

机译：便宜，快速和足够用于非生物医学领域，但是可用于临床自然语言处理吗？评估众包临床试验公告命名的实体注释
5. A Comparative Analysis of Selected Set of Natural Language Processing (NLP) and Machine Learning (ML) Algorithms for Clinical Coding using Clinical Classification Standards [D] . Kaur, Rajvir 2018

机译：使用临床分类标准对用于临床编码的自然语言处理（NLP）和机器学习（ML）算法的选择集进行比较分析
6. Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements [O] . Todd Lingren, Louise Deleger, Katalin Molnar, 2014

机译：评估预批注对批注速度和潜在偏见的影响：在临床试验公告中为自然语言处理金标准开发的临床命名实体识别
7. Annotation of phenotypes using ontologies: a Gold Standard for the training and evaluation of natural language processing systems [O] . Wasila Dahdul, Prashanti Manda, Hong Cui, 2018

机译：使用本体的表型注释：自然语言处理系统培训和评估的金标准

Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

摘要

著录项

相似文献

相关主题

期刊订阅