【24h】

Effects of Pre-trained Word Embeddings on Text-based Deception Detection

机译:预训练词嵌入对基于文本的欺骗检测的影响

获取原文

摘要

With e-commerce transforming the way in which individuals and businesses conduct trades, online reviews have become a great source of information among consumers. With 93% of shoppers relying on online reviews to make their purchasing decisions, the credibility of reviews should be strongly considered. While detecting deceptive text has proven to be a challenge for humans to detect, it has been shown that machines can be better at distinguishing between truthful and deceptive online information by applying pattern analysis on a large amount of data. In this work, we look at the use of several popular pre-trained word embeddings (Word2Vec, GloVe, fastText) with deep neural network models (CNN, BiLSTM, CNN-BiLSTM) to determine the influence of word embedding on the accuracy of detecting deception. Some pre-trained word embeddings have shown to adversely affect the classification accuracy when compared to training the model on text embedding using the domain specific data. Through the combination of CNN and BiLSTM along with the fastText pre-trained word embeddings, we were able to achieve an accuracy of 88.8 percent on the hotel review dataset published by Ott et al. in 2011.
机译:随着电子商务改变了个人和企业进行交易的方式,在线评论已成为消费者中重要的信息来源。由于93%的购物者依靠在线评论来做出购买决定,因此应该强烈考虑评论的可信度。虽然检测欺骗性文本已证明是人类难以检测的挑战,但事实表明,通过对大量数据进行模式分析,机器可以更好地区分真实的和欺骗性的在线信息。在这项工作中,我们着眼于使用几种流行的预训练词嵌入(Word2Vec,GloVe,fastText)和深度神经网络模型(CNN,BiLSTM,CNN-BiLSTM)来确定词嵌入对检测准确性的影响欺骗。与使用领域特定数据在文本嵌入上训练模型相比,某些预训练的词嵌入已显示出对分类准确性的不利影响。通过CNN和BiLSTM的结合以及fastText预训练的单词嵌入,在Ott等人发表的酒店评论数据集上,我们能够达到88.8%的准确性。在2011年。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号