首页> 外文会议>International conference on language resources and evaluation >Similarity Ranking as an Attribute for Machine Learning Approach to Authorship Identification
【24h】

Similarity Ranking as an Attribute for Machine Learning Approach to Authorship Identification

机译:相似性排名作为作者身份识别的机器学习方法的一种属性

获取原文

摘要

In the authorship identification task, examples of short writings of N authors and an anonymous document written by one of these JV authors are given. The task is to determine the authorship of the anonymous text. Practically all approaches solved this problem with machine learning methods. The input attributes for the machine learning process are usually formed by stylistic or grammatical properties of individual documents or a defined similarity between a document and an author. In this paper, we present the results of an experiment to extend the machine learning attributes by ranking the similarity between a document and an author: we transform the similarity between an unknown document and one of the JV authors to the order in which the author is the most similar to the document in the set of JV authors. The comparison of similarity probability and similarity ranking was made using the Support Vector Machines algorithm. The results show that machine learning methods perform slightly better with attributes based on the ranking of similarity than with previously used similarity between an author and a document.
机译:在作者身份识别任务中,给出了N位作者的简短著述和这些合资企业作者之一撰写的匿名文档的示例。任务是确定匿名文本的作者身份。实际上,所有方法都使用机器学习方法解决了这个问题。机器学习过程的输入属性通常由单个文档的样式或语法属性或文档与作者之间定义的相似性形成。在本文中,我们通过对文档和作者之间的相似性进行排名来展示扩展机器学习属性的实验结果:我们将未知文档和一位合资企业作者之间的相似性转换为作者是与合资企业中的文档最相似。使用支持向量机算法对相似度概率和相似度等级进行比较。结果表明,与基于作者和文档的先前使用的相似性相比,基于相似性排序的属性的机器学习方法的性能稍好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号