首页> 外文会议>Cyber Security Cryptography and Machine Learning >Stylometric Authorship Attribution of Collaborative Documents
【24h】

Stylometric Authorship Attribution of Collaborative Documents

机译:协作文档的样式作者权归属

获取原文
获取原文并翻译 | 示例

摘要

Stylometry is the study of writing style based on linguistic features and is typically applied to authorship attribution problems. In this work, we apply stylometry to a novel dataset of multi-authored documents collected from Wikia using both relaxed classification with a support vector machine (SVM) and multi-label classification techniques. We define five possible scenarios and show that one, the case where labeled and unlabeled collaborative documents by the same authors are available, yields high accuracy on our dataset while the other, more restrictive cases yield lower accuracies. Based on the results of these experiments and knowledge of the multi-label classifiers used, we propose a hypothesis to explain this overall poor performance. Additionally, we perform authorship attribution of pre-segmented text from the Wikia dataset, and show that while this performs better than multi-label learning it requires large amounts of data to be successful.
机译:笔迹法是基于语言特征的写作风格研究,通常用于作者身份归因问题。在这项工作中,我们将样式学应用于从Wikia收集的多作者文档的新数据集中,同时使用支持向量机(SVM)的宽松分类和多标签分类技术。我们定义了五种可能的情况,并显示出一种情况,即由同一作者提供带标签的和未带标签的协作文档的情况,在我们的数据集上具有很高的准确性,而另一种限制性更强的情况产生的准确性较低。基于这些实验的结果以及所使用的多标签分类器的知识,我们提出了一个假设来解释这种总体性能不佳的情况。此外,我们对Wikia数据集中的预分段文本进行作者身份归因,并表明尽管这样做比多标签学习要好,但要成功需要大量数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号