首页> 外文会议>International Conference on Big Data Computing and Communications >Hierarchical Bidirectional Long Short-Term Memory Networks for Chinese Messaging Spam Filtering
【24h】

Hierarchical Bidirectional Long Short-Term Memory Networks for Chinese Messaging Spam Filtering

机译:用于中文垃圾邮件过滤的分层双向长期短期存储网络

获取原文

摘要

Messaging spam filtering is an important research area in the field of natural language processing (NLP). In this paper, we propose a hierarchical bidirectional long short-term memory network based approach for Chinese messaging spam filtering. Considering that a message consists of sentences and a sentence consists of words, we design a hierarchical architecture to generate the representation of a message that aggregates the information of each word in each sentence. Besides, we notice that the errors produced by Chinese segment may affect the performance of our model. So we use the unsegmented characters as input rather than the segmented words like most of the Chinese NLP models. The experimental results demonstrate that our method outperforms most of the state-of-the-art methods on the dataset that is tagged manually by a online medical company. Meanwhile, we also show that the unsegmented character has better performance than segmented word in this task.
机译:垃圾邮件过滤是自然语言处理(NLP)领域的重要研究领域。在本文中,我们提出了一种基于分层双向长期短期记忆网络的中文消息垃圾邮件过滤方法。考虑到消息由句子组成并且句子由单词组成,我们设计了一种层次结构来生成消息的表示形式,该表示形式汇总了每个句子中每个单词的信息。此外,我们注意到中文段产生的错误可能会影响我们模型的性能。因此,我们使用未分割的字符作为输入,而不是像大多数中文NLP模型一样使用分割的词。实验结果表明,我们的方法优于在线医学公司手动标记的数据集上的大多数最新方法。同时,我们还表明,在此任务中,非分段字符比分段词具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号