...
首页> 外文期刊>Biomedical Informatics Insights >Topic Categorisation of Statements in Suicide Notes with Integrated Rules and Machine Learning:
【24h】

Topic Categorisation of Statements in Suicide Notes with Integrated Rules and Machine Learning:

机译:具有集成规则和机器学习的自杀笔记中语句的主题分类:

获取原文

摘要

We describe and evaluate an automated approach used as part of the i2b2 2011 challenge to identify and categorise statements in suicide notes into one of 15 topics, including Love, Guilt, Thankfulness, Hopelessness and Instructions. The approach combines a set of lexico-syntactic rules with a set of models derived by machine learning from a training dataset. The machine learning models rely on named entities, lexical, lexico-semantic and presentation features, as well as the rules that are applicable to a given statement. On a testing set of 300 suicide notes, the approach showed the overall best micro F-measure of up to 53.36%. The best precision achieved was 67.17% when only rules are used, whereas best recall of 50.57% was with integrated rules and machine learning. While some topics (eg, Sorrow, Anger, Blame) prove challenging, the performance for relatively frequent (eg, Love) and well-scoped categories (eg, Thankfulness) was comparatively higher (precision between 68% and 79%), suggesting that automated text mining approaches can be effective in topic categorisation of suicide notes.
机译:我们描述并评估一种自动化方法,该方法用作i2b2 2011挑战的一部分,以将自杀笔记中的陈述识别和分类为15个主题之一,包括爱,内Gui,感恩,绝望和指示。该方法将一组词汇语法规则与一组通过机器学习从训练数据集中得出的模型相结合。机器学习模型依赖于命名实体,词法,词法语义和表示功能以及适用于给定语句的规则。在300个自杀记录的测试集上,该方法显示总体最佳微观F值高达53.36%。如果仅使用规则,则达到的最佳精度为67.17%,而使用集成规则和机器学习的最佳召回率为50.57%。尽管某些主题(例如,Sorrow,Anger,Blame)具有挑战性,但相对频繁的(例如,Love)和范围较广的类别(例如,Thankfulness)的表现相对较高(精度在68%和79%之间),这表明自动文本挖掘方法可以有效地对自杀笔记进行主题分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号