首页> 外文期刊>Journal of computational and theoretical nanoscience >Incorporating Lexical Knowledge via WordNet to Latent Dirichlet Allocation in Offensive Message Detection
【24h】

Incorporating Lexical Knowledge via WordNet to Latent Dirichlet Allocation in Offensive Message Detection

机译:通过Wordnet将词汇知识纳入攻击消息检测中的潜在Dirichlet分配

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

We propose a model to offensive messages detection for political discourse that combines topic modeling and lexicon-based approaches for knowledge extraction. We develop an extension to the LDA suitable for offensive message detection by leveraging on lexical and semantic word features. Our model employs an externally supplied lexicon and WordNet, a lexical database, to incorporate prior knowledge to the LDA. At the document-level, we model the semantic relationship between a limited list of concepts with political orientation and corpus-determined themes. At the topic-level, we incorporate lexical word prior based on the WordNet lexical relationship between an externally supplied list of offensive words and topics generated from the corpus. Our model presumes a set of preselected labels that document themes should fit. We test our model against different sets of datasets and compare its performance against several baselines. The experiments confirm the effectiveness of our approach in both prediction and classification tasks.
机译:我们向攻击性消息检测提供了一个模型,用于政治话语,将主题建模和基于词汇的知识提取方法提出。我们通过利用词汇和语义词特征,开发适合进攻消息检测的LDA的扩展。我们的模型采用外部提供的Lexicon和Wordnet,一个词汇数据库,将先验知识合并到LDA。在文档级别,我们模拟了具有政治定位和决定主题的有限概念列表之间的语义关系。在主题级别,我们基于从语料库生成的外部提供的冒犯单词和主题列表之间的Wordnet词汇关系来结合了词汇字词。我们的模型假定一组预选标签,文档主题应该适合。我们对不同的数据集进行测试,并将其对若干基线进行比较。实验证实了我们在预测和分类任务中的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号