首页> 外文期刊>Journal of computational and theoretical nanoscience >Incorporating Lexical Knowledge via WordNet to Latent Dirichlet Allocation in Offensive Message Detection

Incorporating Lexical Knowledge via WordNet to Latent Dirichlet Allocation in Offensive Message Detection


获取原文并翻译 | 示例


We propose a model to offensive messages detection for political discourse that combines topic modeling and lexicon-based approaches for knowledge extraction. We develop an extension to the LDA suitable for offensive message detection by leveraging on lexical and semantic word features. Our model employs an externally supplied lexicon and WordNet, a lexical database, to incorporate prior knowledge to the LDA. At the document-level, we model the semantic relationship between a limited list of concepts with political orientation and corpus-determined themes. At the topic-level, we incorporate lexical word prior based on the WordNet lexical relationship between an externally supplied list of offensive words and topics generated from the corpus. Our model presumes a set of preselected labels that document themes should fit. We test our model against different sets of datasets and compare its performance against several baselines. The experiments confirm the effectiveness of our approach in both prediction and classification tasks.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号