首页> 外文会议>International conference on mining intelligence and knowledge exploration >An Empirical Evaluation of SVM on Meta Features for Authorship Attribution of Online Texts
【24h】

An Empirical Evaluation of SVM on Meta Features for Authorship Attribution of Online Texts

机译:支持向量机的在线文本作者属性归因元特征的实证评估

获取原文

摘要

Authorship attribution (AA) has been studied by many researchers. Recently, with the widespread of online texts, authorship attribution of online texts starts to receive a great deal of attentions. The essence of this problem is to identify a set of features that can capture the writing styles of an author. However, previous studies on feature identification mainly used statistical methods and conducted out experiments on small data sets, i.e., less than 10. This scale is distance from the real application of AA of online texts. In addition, due to the special characteristics of online texts, statistical approaches are rarely used for this problem. As the the performance of authorship identification depends highly on the the combination of the features used and classification methods, the feature sets for traditional authorship attribution needs to be re-examined using machine learning approaches. In this paper, we evaluate the effectiveness of six types of meta features on two public data sets with SVM, a well established machine learning technique. The experimental results show that lexical and syntactic features are the most promising features for AA of online texts. Furthermore, a number of interesting findings regarding the impacts of different types of features on authorship attribution are discovered through our experiments.
机译:许多研究人员已经研究了作者身份归因(AA)。近年来,随着在线文本的广泛使用,在线文本的作者身份归属问题开始引起人们的广泛关注。这个问题的实质是确定一组可以捕获作者写作风格的功能。但是,以前的特征识别研究主要使用统计方法,并在小于10个的小数据集上进行了实验。该规模与在线文本AA的实际应用相距甚远。此外,由于在线文本的特殊性,很少使用统计方法来解决此问题。由于作者身份识别的性能高度依赖于所使用的特征和分类方法的组合,因此需要使用机器学习方法来重新检查传统作者身份归属的特征集。在本文中,我们使用完善的机器学习技术SVM评估了两种公共数据集上六种元特征的有效性。实验结果表明,词汇和句法特征是在线文本AA最有前途的特征。此外,通过我们的实验,发现了许多有关不同类型功能对作者归属的影响的有趣发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号