首页> 外文会议>Simulation Innovation Workshop >Natural Language Processing and Web Tools for Mapping Units from ClinicalTrials.Gov
【24h】

Natural Language Processing and Web Tools for Mapping Units from ClinicalTrials.Gov

机译:用于从ClinciniticRials.gov映射单位的自然语言处理和Web工具

获取原文

摘要

Summary data from some clinical trials is now becoming available in electronic format due to US public law through the NIH NLM database ClinicalTrials.Gov. The database currently holds over a quarter of a million trials with about 30K trials with result records. However, despite the great work of the team that developed this fast growing database, the data held in it is far from standardized and using it requires effort, especially for machines. The difficulty arises from the fact that data entry to this database is manual from multiple external sources, mostly textual, and somewhat permissive. Although entered data goes through a review process, the review is human and therefore sometimes forgiving. For machine comprehension, definition of units is essentials so numbers held in the datab ase will make sense. There are currently over 20K units for 30K clinical trials with results and many of the units are synonyms and some are even errors. Even CDISC units that are in a good level of standardization need normalization and enhancement. This presentation will discuss recent advances in the effort to standardize the medical units. Specifically Natural Language Processing (NLP) techniques are used alongside other Machine Learning methods to cluster similar units together. This allows a human to inspect and standardize the units more efficiently. In this presentation the NLP techniques used will be discussed in details as well as the clustering techniques. Similar techniques merged with web tools will be helpful for future analysis of other textual fields within the fast growing database. To move the standardization effort forward a web tool was created to allow humans to classify the units. Multiple users can see the units in each cluster and the machine suggestions and classify those. The web portal is accessible through: ClinicalUnitMapping.com.
机译:由于美国公共法律通过NIH NLM数据库ClinicalTrials.gov,一些临床试验的摘要数据现已成为电子格式。该数据库目前持有四分之一的试验,其中有大约30k的试验结果记录。但是,尽管该团队的伟大工作开发出这个快速增长的数据库,但它在它所持有的数据远非标准化和使用它需要努力,尤其是机器。难以从此数据库的数据输入是来自多个外部源的手动,大多是文本的,并且有些允许。虽然进入数据通过审查过程,但审查是人类,因此有时宽容。对于机器理解,单位的定义是必需品,所以数据ASE中的数字将有意义。目前有超过20K的临床试验,结果,其中许多单位是同义​​词,有些单位甚至是错误。甚至在标准化水平较好的CDISC单元也需要正常化和增强。本演示文稿将讨论最近努力标准化医疗单位的进步。特别是自然语言处理(NLP)技术与其他机器学习方法一起使用,以将类似的单位组合在一起。这使得人类能够更有效地检查和标准化单位。在本演示文章中,将详细讨论使用的NLP技术以及聚类技术。与Web Tools合并的类似技术将有助于对快速增长数据库中的其他文本字段的未来分析。要将标准化努力转发,创建了Web工具以允许人类对单位进行分类。多个用户可以在每个群集中看到单位以及机器建议并对其进行分类。通过以下方式访问Web门户:ClinicalUITMapppy.com。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号