This paper describes the National Research Council of Canada’s submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.
展开▼
机译:本文介绍了加拿大国家研究委员会(National Research Council of Canada)在2011年i2b2 NLP挑战中有关自杀笔记中情绪检测的意见。在此任务中,为自杀笔记的每个句子添加零个或多个情感,使其成为多标签句子分类任务。我们采用两种不同的大利润模型来处理多个标签。第一种方法针对每个情绪使用一个分类器,其构建目的是简化标签平衡问题并允许极其快速的开发。这种方法非常有效,F得分为55.22,在比赛中排名第四,使其成为不使用网络统计数据或重新标注训练数据的最佳系统。其次,我们提出了一个潜在序列模型,该模型学习将句子分为多个情感区域。该模型旨在优雅地处理传达多种思想和情感的句子。潜在序列模型的初步工作显示了希望,使用较少的功能即可获得可比的性能。
展开▼