Authorship identification of documents with high content similarity

机译：内容相似度高的文档的作者身份标识

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e. authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors. Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper. Both studies confirmed that this task is quite challenging. To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (1) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (2) assist forensic experts or linguists to create profiles of writers, (3) support intelligence applications to analyze aggressive and threatening messages and (4) help editor conformity by adhering to, for instance, journal specific writing style.

机译：我们的工作目标是受将文本片段与其真实作者相关联的任务所启发。在这项工作中，我们专注于分析人类判断不同写作风格的方式。这种分析可以帮助更好地理解该过程，从而相应地模拟/模仿这种行为。与使用内容功能的该领域中的大多数工作（即作者身份归属，窃检测等）不同，我们只关注作者的风格（即内容无关）特征。因此，我们进行了两项试点研究，以确定人类是否可以在具有高度内容相似性的文档中识别作者身份。第一个是涉及众包的定量实验，第二个是本文作者执行的定性实验。两项研究均证实该任务颇具挑战性。为了更好地了解人类如何解决此问题，我们对研究结果进行了探索性数据分析。在第一个实验中，我们将决策与内容特征和样式特征进行了比较。在第二篇中，评估人员描述了他们的判断依据的过程和功能。我们详细分析的结果可能（1）有助于改进自动作者归因以及窃检测等算法；（2）协助法医专家或语言学家创建作家概况；（3）支持情报应用程序分析攻击性和威胁性消息和（4）通过遵循（例如）特定于期刊的写作风格来帮助编辑者顺应性。

著录项

期刊名称 Springer Open Choice
作者
Andi Rexha; Mark Kröll; Hermann Ziak; Roman Kern;
展开▼
作者单位

展开▼
年(卷),期 -1(115),1
年度 -1
页码 223–237
总页数 15
原文格式 PDF
正文语种
中图分类外科学;
关键词
Writing style analysis Content agnostic stylometry High content similarity Authorship identification;

机译：写作风格分析;内容不可识别的笔法;内容高度相似;作者身份鉴定;

相似文献

外文文献
中文文献
专利

1. Authoring social reality with documents: From authorship of documents and documentary boundary objects to practical authorship [J] . Huvila Isto The Journal of Documentation . 2019,第1期

机译：用文档创作社会现实：从文档的创作和文档的边界对象到实际的创作
2. Link prediction in co-authorship networks based on hybrid content similarity metric [J] . Pham Minh Chuan, Le Hoang Son, Ali Mumtaz, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2018,第8期

机译：基于混合内容相似度量的共同作者网络链路预测
3. Algorithm for Document Authorship Identification and Plagiarism Evaluation Based on Generalized Suffix Tree [J] . Aleksandar Veljkovi? Review of the National Center for Digitization . 2021,第a期

机译：基于广义后缀树的文献作者识别与抄袭评估的算法
4. Enhancement of Performance of Document Clustering in the Authorship Identification Problem with a Weighted Cosine Similarity [C] . Carolina Martin-del-Campo-Rodriguez, Grigori Sidorov, Ildar Batyrshin Mexican international conference on artificial intelligence . 2018

机译：加权余弦相似度的作者识别问题中文档聚类性能的增强
5. Structure and content semantic similarity detection of extensible markup language documents using keys. [D] . Viyanon, Waraporn. 2010

机译：使用密钥的可扩展标记语言文档的结构和内容语义相似性检测。
6. Every document and picture tells a story: using internal corporate document reviews semiotics and content analysis to assess tobacco advertising [O] . S J Anderson, T Dewhirst, P M Ling 2006

机译：每个文档和图片都讲述一个故事：使用内部公司文档审查符号学和内容分析来评估烟草广告
7. Two-layer classification and distinguished representations of users and documents for grouping and authorship identification [O] . Mohtasseb Haytham, Ahmed Amr 2009

机译：用户和文档的两层分类和可区分的表示形式，用于分组和作者身份识别

Authorship identification of documents with high content similarity

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅