【24h】

AUTHORSHIP ATTRIBUTION BASED ON FEATURE SET SUBSPACING ENSEMBLES

机译:基于特征集子集封装的授权属性

获取原文
获取原文并翻译 | 示例
       

摘要

Authorship attribution can assist the criminal investigation procedure as well as cybercrime analysis. This task can be viewed as a single-label multi-class text categorization problem. Given that the style of a text can be represented as mere word frequencies selected in a language-independent method, suitable machine learning techniques able to deal with high dimensional feature spaces and sparse data can be directly applied to solve this problem. This paper focuses on classifier ensembles based on feature set subspacing. It is shown that an effective ensemble can be constructed using, exhaustive disjoint subspacing, a simple method producing many poor but diverse base classifiers. The simple model can be enhanced by a variation of the technique of cross-validated committees applied to the feature set. Experiments on two benchmark text corpora demonstrate the effectiveness of the presented method improving previously reported results and compare it to support vector machines, an alternative suitable machine learning approach to authorship attribution.
机译:作者身份归属可以协助刑事调查程序以及网络犯罪分析。此任务可以看作是单标签多类文本分类问题。鉴于可以将文本的样式表示为仅由语言独立方法选择的单词频率,因此可以直接使用能够处理高维特征空间和稀疏数据的合适的机器学习技术来解决此问题。本文着重于基于特征集子空间的分类器集成。结果表明,可以使用穷尽的不相交子空间来构造有效的合奏,这种简单的方法可以产生许多差的但基础多样的分类器。可以通过对应用于功能集的交叉验证委员会的技术进行改进来增强简单模型。在两个基准文本语料库上进行的实验表明,该方法可改善先前报告的结果,并将其与支持向量机进行比较,该向量机是作者身份归属的另一种合适的机器学习方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号