首页> 外文会议>First International Conference on Networks amp; Soft Computing >Feature selection techniques for gender prediction from blogs
【24h】

Feature selection techniques for gender prediction from blogs

机译:通过博客进行性别预测的特征选择技术

获取原文
获取原文并翻译 | 示例

摘要

The goal of this paper is to identify gender of blog authors. Features such as POS tags, unigram (words+punctuations), bigrams and word classes are considered. To synthesis/rank features we are using Mutual information, Chi-square and Information gain methods. The dataset is the collection of 3227 blogs originally derived from blogs set, and among them 1679 were written by male and 1548 were written by female. The results were obtained using 10-cross fold validation. Unigram of words gave better accuracy of 78.81% in comparison with the other features. We found that chi-square is the best in ranking features. The classification is done using Multinomial Naïve Bayes Classifier, and different kernel functions of SVM such as PolyKernel, Puk, Normalized PolyKernel and RBFkernel.
机译:本文的目的是确定博客作者的性别。考虑了诸如POS标签,会标(单词+标点符号),双字母组和单词类之类的功能。为了合成/排序特征,我们使用互信息,卡方和信息增益方法。该数据集是3227个最初来自博客集的博客的集合,其中1679个由男性撰写,1548个由女性撰写。使用十倍交叉验证获得结果。与其他功能相比,单词的字母组合更好的准确性为78.81%。我们发现卡方是排名特征最好的。使用多项式朴素贝叶斯分类器和SVM的不同内核功能(例如PolyKernel,Puk,Normalized PolyKernel和RBFkernel)进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号