首页> 外文会议>IEEE International Conference on Advanced Computing >Gender Classification of Blog Authors: With Feature Engineering and Deep Learning using LSTM Networks
【24h】

Gender Classification of Blog Authors: With Feature Engineering and Deep Learning using LSTM Networks

机译:博客作者的性别分类:使用LSTM网络进行特征工程和深度学习

获取原文

摘要

In this paper, we present two approaches to automatically classify the gender of blog authors: the first is a manual feature extraction based system incorporating two novel feature classes: variable length character sequence patterns and thirteen new word classes, along with an added class of surface features while the second is a first-ever application of a memory variant of Recurrent Neural Networks, i.e. Bidirectional Long Short Term Memory Networks (BLSTMs) on this task. We use two blog data sets to report our results: the first is a well-explored one used by the previous state-of-the-art model while the other is a 20 times larger corpus. For the first system, we use a voting of machine learning classifiers to obtain an improved accuracy with respect to the previous feature mining systems on the former data set. Using our second approach, we show that the accuracy obtained using such deep LSTMs is comparable to the current state-of-the-art deep learning system for the task of gender classification. Finally, we carry out a comparative study of performance of both the systems on the two data sets.
机译:在本文中,我们提供了两种对博客作者性别进行自动分类的方法:第一种是基于手动特征提取的系统,其中包含两个新颖的特征类:可变长度字符序列模式和十三个新单词类,以及一个附加的表面类功能,而第二个功能是循环神经网络(即双向长期短期记忆网络(BLSTM))在此任务上的记忆变体的首次应用。我们使用两个博客数据集来报告我们的结果:第一个是经过充分研究的数据集,该数据集被以前的最新模型使用,而另一个则是语料库的20倍。对于第一个系统,我们使用机器学习分类器的投票来获得相对于先前数据集上先前特征挖掘系统更高的准确性。使用我们的第二种方法,我们表明,使用这种深度LSTM获得的准确性与当前用于性别分类任务的最新深度学习系统相当。最后,我们在两个数据集上对两个系统的性能进行了比较研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号