首页> 外文会议>Annual conference of the International Society of Exposure Science >Data Gathering Methods for Existing Chemicals Risk Evaluations under the Amended TSCA: Proposal for Using Natural Language Processing to Prioritize the Literature Search and Review
【24h】

Data Gathering Methods for Existing Chemicals Risk Evaluations under the Amended TSCA: Proposal for Using Natural Language Processing to Prioritize the Literature Search and Review

机译:经修订的TSCA进行的现有化学品风险评估的数据收集方法:使用自然语言处理优先进行文献检索和审查的建议

获取原文

摘要

The amended Toxic Substances Control Act (TSCA) requires EPA to assess risks for existing chemicals. The risk evaluation must "integrate and assess available information on hazards and exposure for the conditions of use of the chemical substance, including information on specific risks of injury to health or the environment" and "describe the weight of scientific evidence for the identified hazards and exposure." To meet these requirements, EPA searched for all relevant data for the major components of an evaluation (engineering, exposure, fate, ecological hazard, and human health hazard), which produced thousands of results per chemical requiring manual review. In this presentation, we explore natural language processing (NLP), to automate discovery of relevant data from the peer reviewed literature. We describe the initial data gathering method for the first 10 chemicals undergoing risk evaluation focusing on the strategy for the exposure, engineering, and fate disciplines. We then explain how NLP can sort references into clusters using text patterns to prioritize extensive search results. We used a supervised form of clustering by adding a seed set (i.e., references identified a priori as relevant to the chemical) to the corpus to be clustered. Using NLP techniques, we compared the clustered results from the peer reviewed literature to the results from manually screening the same peer-reviewed references. Specifically, we calculated recall and elimination rate using clustering predictions compared to the entirely manual approach of reviewing all results. This approach requires a seed set which might not be easily obtained for future chemicals that have fewer known relevant references available. To address this, we also present the results using a generic, chemical-agnostic seed set that could be applied to future chemical assessments. The views expressed in this abstract are those of the authors and do not represent Agency policy or endorsement.
机译:修订后的《有毒物质控制法》(TSCA)要求EPA评估现有化学品的风险。风险评估必须“整合和评估有关化学物质使用条件下有关危害和暴露的可用信息,包括有关危害健康或环境的特定风险的信息”,并“说明所发现危害的科学依据的重要性。接触。”为了满足这些要求,EPA搜索了评估的主要组成部分(工程,接触,命运,生态危害和人类健康危害)的所有相关数据,每种化学品产生了数千个结果,需要人工审核。在此演示文稿中,我们探索自然语言处理(NLP),以自动从同行评审的文献中发现相关数据。我们描述了进行风险评估的前10种化学品的初始数据收集方法,重点是暴露,工程和命运学科的策略。然后,我们将说明NLP如何使用文本模式将引用分类到群集中,以对大量搜索结果进行优先级排序。我们通过将种子集(即参考文献确定与该化学物质相关的先验知识)添加到要聚类的语料库中,以一种有监督的聚类形式。使用NLP技术,我们比较了同行评审文献的聚类结果与手动筛选相同同行评审参考文献的结果。具体来说,我们使用聚类预测计算召回率和消除率,而不是使用人工审查所有结果的方法。这种方法需要种子集,而对于已知相关参考文献较少的未来化学药品,种子集可能很难获得。为了解决这个问题,我们还使用与化学无关的通用种子集介绍了结果,该种子集可用于将来的化学评估。本摘要中表达的观点是作者的观点,并不代表原子能机构的政策或认可。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号