首页> 美国卫生研究院文献>Springer Open Choice >Authorship identification of documents with high content similarity
【2h】

Authorship identification of documents with high content similarity

机译:内容相似度高的文档的作者身份标识

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e. authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors. Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper. Both studies confirmed that this task is quite challenging. To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (1) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (2) assist forensic experts or linguists to create profiles of writers, (3) support intelligence applications to analyze aggressive and threatening messages and (4) help editor conformity by adhering to, for instance, journal specific writing style.
机译:我们的工作目标是受将文本片段与其真实作者相关联的任务所启发。在这项工作中,我们专注于分析人类判断不同写作风格的方式。这种分析可以帮助更好地理解该过程,从而相应地模拟/模仿这种行为。与使用内容功能的该领域中的大多数工作(即作者身份归属,窃检测等)不同,我们只关注作者的风格(即内容无关)特征。因此,我们进行了两项试点研究,以确定人类是否可以在具有高度内容相似性的文档中识别作者身份。第一个是涉及众包的定量实验,第二个是本文作者执行的定性实验。两项研究均证实该任务颇具挑战性。为了更好地了解人类如何解决此问题,我们对研究结果进行了探索性数据分析。在第一个实验中,我们将决策与内容特征和样式特征进行了比较。在第二篇中,评估人员描述了他们的判断依据的过程和功能。我们详细分析的结果可能(1)有助于改进自动作者归因以及窃检测等算法;(2)协助法医专家或语言学家创建作家概况;(3)支持情报应用程序分析攻击性和威胁性消息和(4)通过遵循(例如)特定于期刊的写作风格来帮助编辑者顺应性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号