首页> 外文会议>International Conference on Analysis of Images, Social Networks, and Texts >Automated Detection of Non-Relevant Posts on the Russian Imageboard '2ch': Importance of the Choice of Word Representations
【24h】

Automated Detection of Non-Relevant Posts on the Russian Imageboard '2ch': Importance of the Choice of Word Representations

机译:自动检测俄罗斯图像框“2CH”的非相关帖子:重要性选择字表示

获取原文

摘要

This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word embeddings of two posts. Considering that the success in this task could be quite sensitive to the choice of word representations, we propose a comparison of the performance of different word embedding models. We train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, Swivel), evaluate embeddings produced by them on dataset of human judgements and compare their performance on the task of non-relevant posts detection. To make the comparison, we propose a dataset of semantic relatedness with posts from one of the most popular Russian Web forums, imageboard "2ch", which has challenging lexical and grammatical features.
机译:本研究考虑了网络论坛上无关帖子的自动检测问题,并讨论了通过近似与在论坛讨论线程的职位和开幕岗位之间的语义相关性的任务的任务来解决这个问题的方法。可以通过学习监督分类器具有两个帖子的编组单词嵌入式来解决近似任务。考虑到这项任务的成功可能对单词表示的选择非常敏感,我们建议比较不同词嵌入模型的性能。我们训练7型号(Word2vec,手套,Word2Vec-F,Wang2Vec,Adagram,FastText,Swivel),评估他们在人类判断数据集上产生的嵌入品,并比较他们对非相关帖子检测任务的表现。为了进行比较,我们向最受欢迎的俄罗斯网络论坛之一,imageBoard“2Ch”的帖子提出了一个语义相关性的数据集,该帖子具有挑战词法和语法特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号