首页> 外文会议>International Conference on Text, Speech and Dialogue >Adaptation of Algorithms for Medical Information Retrieval for Working on Russian-Language Text Content
【24h】

Adaptation of Algorithms for Medical Information Retrieval for Working on Russian-Language Text Content

机译:用于俄语文本内容工作的医学信息检索算法的适应

获取原文

摘要

The paper investigates the possibilities of adapting various ADR algorithms to the Russian language environment. In general, the ADR detection process consists of 4 steps: (1) data collection from social media; (2) classification/filtering of ADR assertive text segments; (3) extraction of ADR mentions from text segments; (4) analysis of extracted ADR mentions for signal generation. The implementation of each step in the Russian-language environment is associated with a number of difficulties in comparison with the traditional English-speaking environment. First of all, they are connected with the lack of necessary databases and specialized language resources. In addition, an important negative role is played by the complex grammatical structure of the Russian language. The authors present various methods of machine learning algorithms adaptation, in order to overcome these difficulties. For step 3 on the material of Russian-language text forums using the ensemble classifier, the Accuracy = 0.805 was obtained. For step 4 on the material of Russian-language EHR, by adapting pyConTextNLP, the F-measure = 0.935 was obtained, and by adapting ConText algorithm, the F-measure = 0.92-0.95 was obtained. A method for full-scale performing of step 4 was developed using cue-based and rule-based approaches, and the F-measure = 67.5% was obtained that is quite comparable to baseline.
机译:本文调查了将各种ADR算法适应俄语环境的可能性。通常,ADR检测过程由4个步骤组成:(1)来自社交媒体的数据收集; (2)ADR自信文本细分分类/过滤; (3)从文本细分提出ADR提取; (4)分析信号生成提取的ADR提升。与传统的英语环境相比,俄语环境中每个步骤的实施与许多困难相关联。首先,它们与缺乏必要的数据库和专业语言资源有关。此外,俄语的复杂语法结构扮演了重要的负面角色。作者呈现了各种机器学习算法适应方法,以克服这些困难。对于使用集合分类器的俄语文本论坛材料的步骤3,获得了精度= 0.805。有关俄语EHR的材料,通过调整pyConTextNLP,得到0.935 F值=,步骤4和通过适应算法的上下文,得到F值= 0.92-0.95。使用基于CUE的和基于规则的方法开发了一种用于全尺度执行步骤4的方法,并且获得了与基线相当的比测量= 67.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号