首页> 外文会议>IEEE Congress on Evolutionary Computation >Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach

【24h】

Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach

机译：使用基于池的模拟退火方法学习可解释医学文本分类的正则表达式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are autogenerated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present highquality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable “black boxes” to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions. The Pool-based Simulated Annealing method is proposed to automatically optimize the performance of machine-generated regular expressions without human interference. The proposed method is tested on real-life data provided by one of China’s largest online medical platforms. Experimental results show that the proposed PSA method further improves the performance of initial machine-generated regular expressions compared with other meta-heuristics such as Genetic Programming. We also believe that the proposed method can serve as a vital complementary tool for the existing machine learning approaches in text classification applications when high levels of interpretability of the solutions are required.

机译：在本文中，我们提出了一种基于规则的引擎，该引擎由高质量且可解释的正则表达式组成，用于医学文本分类。正则表达式通过构造启发式方法自动生成，并使用基于池的模拟退火（PSA）方法进行优化。尽管现有的深度神经网络（DNN）方法在大多数自然语言处理（NLP）应用中均表现出高质量的性能，但该解决方案被认为是人类无法解释的“黑匣子”。因此，当需要可解释的解决方案时，尤其是在医学领域，通常会引入基于规则的方法。但是，对于大数据集，正则表达式的构造可能会非常费力。这项研究旨在减少人工工作，同时保持高质量的解决方案。提出了基于池的模拟退火方法，以自动优化机器生成的正则表达式的性能，而无需人工干预。该方法已在中国最大的在线医疗平台之一提供的真实数据上进行了测试。实验结果表明，与遗传规划等其他元启发式算法相比，所提出的PSA方法进一步提高了初始机器生成的正则表达式的性能。我们还相信，当需要解决方案的高度可解释性时，对于文本分类应用中的现有机器学习方法，所提出的方法可以充当重要的补充工具。

著录项

来源
《IEEE Congress on Evolutionary Computation》|2020年|1-7|共7页
会议地点
作者
Chaofan Tu; Menglin Cui;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
simulated annealing; regular expression; medical text classification;

机译：模拟退火;正则表达式;医学文本分类;

相似文献

外文文献
中文文献
专利

1. Learning regular expressions for clinical text classification [J] . BuiD.D.A., Zeng-TreitlerQ. Journal of the American Medical Informatics Association : . 2014,第5期

机译：学习用于临床文本分类的正则表达式
2. Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach [J] . Xiaoyi Pan, Boyu Chen, Heng Weng, JMIR Medical Informatics . 2020,第7期

机译：中国叙事临床文本的时间表达分类和标准化：模式学习方法
3. CREGEX: A Biomedical Text Classifier Based on Automatically Generated Regular Expressions [J] . Flores Christopher A., Figueroa Rosa L., Pezoa Jorge E., Quality Control, Transactions . 2020,第期

机译：Cregex：基于自动生成的正则表达式的生物医学文本分类器
4. Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming [C] . Jiandong Liu, Ruibin Bai, Zheng Lu, IEEE Congress on Evolutionary Computation . 2020

机译：基于遗传程序的医学文本分类的数据驱动正则表达式演化
5. Indexing Text Documents for Fast Evaluation of Regular Expressions [D] . Chen, Ting 2012

机译：索引文本文档以快速评估正则表达式
6. Research and applications: Learning regular expressions for clinical text classification [O] . Duy Duc An Bui, Qing Zeng-Treitler 2014

机译：研究与应用：学习用于临床文本分类的正则表达式
7. Active Learning for Biomedical Text Classification Based on Automatically Generated Regular Expressions [O] . Christopher A. Flores, Rosa L. Figueroa, Jorge E. Pezoa 2021

机译：基于自动生成的正则表达式的生物医学文本分类主动学习
8. Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing. [R] . A. Lyer G. Shankaranarayanan H. Chen L. She 1998

机译：通过实例进行归纳查询的机器学习方法：使用相关反馈，ID3，遗传算法和模拟退火的实验。

Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach

摘要

著录项

相似文献

相关主题

期刊订阅