首页> 外文OA文献 >Detecting Deception in Text: A Corpus-Driven Approach
【2h】

Detecting Deception in Text: A Corpus-Driven Approach

机译:在文本中检测欺骗:语料库驱动的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Deception is a pervasive psycholinguistic phenomenon---from lies during legal trials to fabricated online reviews. Its identification has been studied for centuries---from the ancient Chinese method of spitting dry rice to the modern polygraph. The recent proliferation of deceptive online reviews has increased the need for automatic deception filtering systems. Although human performance is in general at chance, previous research suggests that the linguistic signals resulting from conscious deception are sufficient for building automatic systems capable of distinguishing deceptive documents from truthful ones. Our interest is in identifying the invariant traits of deception in text, and we argue that these encouraging results in automatic deception detection are mainly due to the side effects of corpus-specific features. This poses no harm to practical applications, but it does not foster a deeper investigation of deception. To demonstrate this and to allow researchers and practitioners to share results, we have developed the largest publicly available shared multidimensional deception corpus for online reviews, the BLT-C (Boulder Lies and Truths Corpus). In an attempt to overcome the inherent lack of ground truth, we have also developed a set of semi-automatic techniques to ensure corpus validity. This thesis shows that detecting deception using supervised machine learning methods is brittle. Experiments conducted using this corpus show that accuracy changes across different kinds of deception (e.g., lying vs. fabrication) and text content dimensions (e.g., sentiment), demonstrating the limitations of previous studies. Preliminary results confirm statistical separation between fabricated and truthful reviews (although not as large as in other studies), but we do not observe any separation between truths and lies, which suggests that lying is a much more difficult class of deception to identify than fabricated spam reviews.
机译:欺骗是一种普遍的心理语言现象-从法律审判期间的谎言到虚假的在线评论。从中国古代的干饭吐痰方法到现代的测谎仪,它的鉴定已经研究了几个世纪。欺骗性在线评论的最新泛滥增加了对自动欺骗过滤系统的需求。尽管一般来说人类的表现是偶然的,但先前的研究表明,有意识的欺骗所产生的语言信号足以建立能够区分欺骗性文件和真实性文件的自动系统。我们的兴趣是确定文本中欺骗的不变性,并且我们认为自动欺骗检测中的这些令人鼓舞的结果主要是由于语料库特定功能的副作用所致。这对实际应用没有危害,但是并没有促进对欺骗的更深入研究。为了证明这一点并允许研究人员和从业者分享结果,我们开发了最大的可公开获取的共享多维欺骗语料库,用于在线评论,即BLT-C(Boulder Lies and Truths语料库)。为了克服固有的固有事实基础不足,我们还开发了一套半自动技术来确保语料库的有效性。该论文表明,使用监督式机器学习方法检测欺骗是脆弱的。使用该语料库进行的实验表明,不同欺骗手段(例如,说谎和捏造)和文本内容尺寸(例如,情感)之间的准确性会发生变化,这证明了先前研究的局限性。初步结果证实了虚假评论与真实评论之间的统计分离(尽管没有其他研究那么大),但是我们没有观察到真相与谎言之间的任何分离,这表明说谎比欺骗性垃圾邮件更难以识别评论。

著录项

  • 作者

    Salvetti Franco;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号