首页> 外文会议>IEEE Congress on Evolutionary Computation >Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach
【24h】

Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach

机译:使用基于池的模拟退火方法学习可解释医学文本分类的正则表达式

获取原文

摘要

In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are autogenerated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present highquality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable “black boxes” to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions. The Pool-based Simulated Annealing method is proposed to automatically optimize the performance of machine-generated regular expressions without human interference. The proposed method is tested on real-life data provided by one of China’s largest online medical platforms. Experimental results show that the proposed PSA method further improves the performance of initial machine-generated regular expressions compared with other meta-heuristics such as Genetic Programming. We also believe that the proposed method can serve as a vital complementary tool for the existing machine learning approaches in text classification applications when high levels of interpretability of the solutions are required.
机译:在本文中,我们提出了一种基于规则的引擎,该引擎由高质量且可解释的正则表达式组成,用于医学文本分类。正则表达式通过构造启发式方法自动生成,并使用基于池的模拟退火(PSA)方法进行优化。尽管现有的深度神经网络(DNN)方法在大多数自然语言处理(NLP)应用中均表现出高质量的性能,但该解决方案被认为是人类无法解释的“黑匣子”。因此,当需要可解释的解决方案时,尤其是在医学领域,通常会引入基于规则的方法。但是,对于大数据集,正则表达式的构造可能会非常费力。这项研究旨在减少人工工作,同时保持高质量的解决方案。提出了基于池的模拟退火方法,以自动优化机器生成的正则表达式的性能,而无需人工干预。该方法已在中国最大的在线医疗平台之一提供的真实数据上进行了测试。实验结果表明,与遗传规划等其他元启发式算法相比,所提出的PSA方法进一步提高了初始机器生成的正则表达式的性能。我们还相信,当需要解决方案的高度可解释性时,对于文本分类应用中的现有机器学习方法,所提出的方法可以充当重要的补充工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号