首页> 美国卫生研究院文献>SpringerPlus >Automatic topic identification of health-related messages in online health community using text classification
【2h】

Automatic topic identification of health-related messages in online health community using text classification

机译:使用文本分类自动识别在线健康社区中与健康相关的消息的主题

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.
机译:为了促进患者参与在线健康社区并获得他们所需的信息支持和情感支持,本文提出了一种主题识别方法,用于自动识别在线健康社区中与健康相关的消息的主题,从而帮助患者达到最相关的消息以有效地查询。在我们的研究中提出了基于特征的分类框架,用于自动主题识别。我们首先收集了与在线健康社区中一些预定义主题相关的消息。然后,我们结合了三种不同类型的特征,即基于n-gram的特征,特定于域的特征和情感特征,以构建四个与健康相关的文本表示形式的特征集。最后,采用三种不同的文本分类技术,C4.5,朴素贝叶斯和SVM来评估我们的主题分类模型。通过比较不同的特征集和不同的分类技术,我们发现基于n-gram的特征,特定于域的特征和情感特征都被认为可以有效地区分不同类型的健康相关主题。此外,基于信息增益的特征约简技术也有效地提高了主题分类性能。在分类技术方面,SVM的性能明显优于C4.5和朴素的贝叶斯。实验结果表明,该方法可以有效地识别在线健康相关消息的主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号