首页> 外文期刊>International Journal of Computers & Applications >A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts
【24h】

A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

机译:具有个人作者特征选择的判别性随机抽样策略用于中文文字的书写识别

获取原文
获取原文并翻译 | 示例
       

摘要

The auto authorship recognition has become a novel technique to investigate cybercrimes. But the challenge of the research is that a huge number of features exist in the moderate-sized corpus, which causes the curse of over-training. Besides, it is hard to distinguish between potential authors only by a single feature set. In this paper, we proposed a random sampling style ensemble method with individual-author feature selection to exploit the high-dimensional feature space. The proposed method randomly picks writing-style features on each individual-author feature set (IAFS) partitioned from the whole feature set. The lAFSs are heuristically selected with training set of each author. Then, multiple base classifiers (BCs) are formed on the sampled feature sets. Finally, all BCs are fused to get a final decision. Experimental results on the real-life Chinese forum data verify the robustness of the proposed method compared with conventional ensemble methods. We also analyze the diversity of algorithm to reveal that the ensemble strategy is more effective and can construct more diverse BCs than random subspace methods.
机译:自动作者身份识别已成为一种调查网络犯罪的新颖技术。但是该研究的挑战在于,中等大小的语料库中存在大量特征,这会导致过度训练的诅咒。此外,仅通过单个功能集就很难区分潜在的作者。在本文中,我们提出了一种具有个人作者特征选择的随机采样样式集成方法,以利用高维特征空间。所提出的方法从整个功能集中划分出的每个个人作者功能集(IAFS)上随机选择写作风格的功能。通过每个作者的训练集启发式地选择lAFS。然后,在采样的特征集上形成多个基本分类器(BC)。最后,将所有BC融合在一起以获得最终决定。在实际中文论坛数据上的实验结果证明了该方法与常规集成方法相比的稳健性。我们还分析了算法的多样性,以表明集成策略比随机子空间方法更有效,并且可以构造更多种BC。

著录项

  • 来源
  • 作者单位

    National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China;

    National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China;

    National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China;

    National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China;

    National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China;

    National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China;

  • 收录信息 美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Writeprint recognition; individual-author feature; set (IAFS); random subspace; method (RSM); class; separability measure; diversity;

    机译:文字识别;个人作者功能;设置(IAFS);随机子空间方法(RSM);类;可分离性度量;多样性;
  • 入库时间 2022-08-18 00:38:55

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号