首页> 外文期刊>Pattern recognition letters >τ-SS3: A text classifier with dynamic n-grams for early risk detection over text streams
【24h】

τ-SS3: A text classifier with dynamic n-grams for early risk detection over text streams

机译:τ-ss3:具有动态n-gram的文本分类器,用于早期风险检测文本流

获取原文
获取原文并翻译 | 示例
           

摘要

A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF's eRisk open tasks. SS3 was designed to deal with ERD problems naturally since: it supports incremental training and classification over text streams, and it can visually explain its rationale. However, SS3 processes the input using a bag-of-word model lacking the ability to recognize important word sequences. This aspect could negatively affect the classification performance and also reduces the descriptiveness of visual explanations. In the standard document classification field, it is very common to use word n-grams to try to overcome some of these limitations. Unfortunately, when working with text streams, using n-grams is not trivial since the system must learn and recognize which n-grams are important "on the fly". This paper introduces tau-SS3, an extension of SS3 that allows it to recognize useful patterns over text streams dynamically. We evaluated our model in the eRisk 2017 and 2018 tasks on early depression and anorexia detection. Experimental results suggest that tau-SS3 is able to improve both current results and the richness of visual explanations. (C) 2020 Elsevier B.V. All rights reserved.
机译:最近引入的分类器称为SS3,已经非常适合处理文本流上的早期风险检测(ERD)问题。它在克利夫夫的ERISK打开任务中获得了早期抑郁症和厌食检测的最新性能。 SS3旨在自然地处理ERD问题,因为它支持对文本流的增量培训和分类,它可以在视觉上解释其理由。然而,SS3使用缺乏识别重要词序列的权能的单词模型来处理输入。这方面可能对分类性能产生负面影响,并且还减少了视觉解释的描述。在标准文档分类字段中,使用Word N-Grams尝试克服这些限制是非常常见的。遗憾的是,在使用文本流时,使用n-grams并不琐碎,因为系统必须学习和识别哪个n-gram是重要的“飞行”。本文介绍了TAU-SS3,SS3的延伸,允许它动态地识别文本流中的有用模式。我们在早期抑郁和厌食检测中评估了我们在Erisk 2017和2018任务中的模型。实验结果表明,TAU-SS3能够改善目前的结果和视觉解释的丰富性。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号