首页> 外文会议>International conference on analysis of Images, social networks and texts >Authorship Verification on Short Text Samples Using Stylometric Embeddings
【24h】

Authorship Verification on Short Text Samples Using Stylometric Embeddings

机译:使用样式嵌入对短文本样本进行作者身份验证

获取原文

摘要

Given the increasing amounts of textual data published online and our inability to reliably identify a person by their writing style, impersonation in the context of social media applications becomes a real-world problem. This work explores how deep learning and metric learning techniques can be applied to the challenge of authorship verification-given a collection of text samples by one author and another document of unknown origin, determine if the new document is written by the same author or not. Using fastText word embeddings, deep LSTMs, and triplet loss, we propose a system that is able to learn stylometric embeddings of different documents and measure their stylistic distance. Unlike most approaches that work on entire documents, our system is able to work on very short text samples of 1-3 sentences, which resembles the length of typical social media posts. We successfully evaluated our approach on the PAN 2014 challenge on authorship verification for English text. The presented system outperforms competing approaches in the PAN 2014 challenge when using 10 short text samples or more.
机译:鉴于在线发布的文本数据数量不断增加,并且我们无法根据其写作风格可靠地识别出一个人,因此在社交媒体应用程序中进行假冒成为一个现实问题。这项工作探索了如何将深度学习和度量学习技术应用于作者身份验证的挑战-给定一个作者和另一个来源不明的文档的文本样本的集合,确定新文档是否由同一作者撰写。使用fastText词嵌入,较深的LSTM和三元组丢失,我们提出了一种系统,该系统能够学习不同文档的样式嵌入并测量其样式距离。与大多数处理整个文档的方法不同,我们的系统能够处理1-3个句子的非常短的文本样本,这类似于典型的社交媒体帖子的长度。我们成功评估了PAN 2014挑战作者对英文文本进行身份验证的方法。当使用10个或更短的文本样本时,提出的系统胜过PAN 2014挑战中的竞争方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号