首页> 外文会议>International Symposium on Cyber Security Cryptography and Machine Learning >Stylometric Authorship Attribution of Collaborative Documents
【24h】

Stylometric Authorship Attribution of Collaborative Documents

机译:协作文件的款式作者归属

获取原文

摘要

Stylometry is the study of writing style based on linguistic features and is typically applied to authorship attribution problems. In this work, we apply stylometry to a novel dataset of multi-authored documents collected from Wikia using both relaxed classification with a support vector machine (SVM) and multi-label classification techniques. We define five possible scenarios and show that one, the case where labeled and unlabeled collaborative documents by the same authors are available, yields high accuracy on our dataset while the other, more restrictive cases yield lower accuracies. Based on the results of these experiments and knowledge of the multi-label classifiers used, we propose a hypothesis to explain this overall poor performance. Additionally, we perform authorship attribution of pre-segmented text from the Wikia dataset, and show that while this performs better than multi-label learning it requires large amounts of data to be successful.
机译:练习型是根据语言特征的写作风格的研究,通常适用于作者归因问题。在这项工作中,我们使用带有支持向量机(SVM)和多标签分类技术的轻松分类,将STRINTOMERY应用于从Wikia收集的多录制文件的新型数据集。我们定义了五个可能的场景并显示了一个,其中标记和未标记的协作文件可获得的情况,在我们的数据集中产生高精度,而另一个更具限制性情况会产生较低的精度。根据这些实验和所使用的多标签分类器的知识的结果,我们提出了一个假设来解释这一整体性能不佳。此外,我们从Wikia DataSet执行预分段文本的作者归属,并显示这比多标签学习更好,它需要大量数据来成功。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号