首页> 外文会议>IEEE International Conference on Big Data >Explainable Authorship Verification in Social Media via Attention-based Similarity Learning
【24h】

Explainable Authorship Verification in Social Media via Attention-based Similarity Learning

机译:通过基于注意力的相似性学习在社交媒体中可解释的作者身份验证

获取原文

摘要

Authorship verification is the task of analyzing the linguistic patterns of two or more texts to determine whether they were written by the same author or not. The analysis is traditionally performed by experts who consider linguistic features, which include spelling mistakes, grammatical inconsistencies, and stylistics for example. Machine learning algorithms, on the other hand, can be trained to accomplish the same, but have traditionally relied on so-called stylometric features. The disadvantage of such features is that their reliability is greatly diminished for short and topically varied social media texts. In this interdisciplinary work, we propose a substantial extension of a recently published hierarchical Siamese neural network approach, with which it is feasible to learn neural features and to visualize the decision-making process. For this purpose, a new large-scale corpus of short Amazon reviews for text comparison research is compiled and we show that the Siamese network topologies outperform state-of-the-art approaches that were built up on stylometric features. Our linguistic analysis of the internal attention weights of the network shows that the proposed method is indeed able to latch on to some traditional linguistic categories.
机译:作者身份验证是分析两个或更多文本的语言模式以确定它们是否由同一作者撰写的任务。传统上,分析是由考虑语言特征的专家执行的,这些语言特征包括拼写错误,语法不一致和风格。另一方面,可以对机器学习算法进行训练以完成相同的任务,但是传统上一直依赖于所谓的测音特征。这种功能的缺点是,对于简短且局部变化的社交媒体文本,其可靠性会大大降低。在这项跨学科的工作中,我们提出了对最新发布的分层暹罗神经网络方法的实质性扩展,利用该方法可以学习神经特征并可视化决策过程。为此,我们编写了一个新的大规模的简短亚马逊评论文集,用于文本比较研究,我们证明了暹罗网络拓扑的性能优于基于样式功能的最新方法。我们对网络内部注意力权重的语言分析表明,所提出的方法确实能够锁定某些传统的语言类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号