【24h】

Reddit: A Gold Mine for Personality Prediction

机译:Reddit:个性预测的金矿

获取原文
获取原文并翻译 | 示例

摘要

Automated personality prediction from social media is gaining increasing attention in natural language processing and social sciences communities. However, due to high labeling costs and privacy issues, the few publicly available datasets are of limited size and low topic diversity. We address this problem by introducing a large-scale dataset derived from Reddit, a source so far overlooked for personality prediction. The dataset is labeled with Myers-Briggs Type Indicators (MBTI) and comes with a rich set of features for more than 9k users. We carry out a preliminary feature analysis, revealing marked differences between the MBTI dimensions and poles. Furthermore, we use the dataset to train and evaluate benchmark personality prediction models, achieving macro F1-scores between 67% and 82% on the individual dimensions and 82% accuracy for exact or one-off accurate type prediction. These results are encouraging and comparable with the reliability of standardized tests.
机译:来自社交媒体的自动人格预测在自然语言处理和社会科学界越来越受到关注。但是,由于高昂的标签成本和隐私问题,少数公开可用的数据集的大小有限且主题多样性较低。我们通过引入来自Reddit的大规模数据集来解决此问题,Reddit是迄今为止被人格预测所忽略的资源。该数据集标有Myers-Briggs类型指示器(MBTI),并为超过9k用户提供了丰富的功能。我们进行了初步的特征分析,揭示了MBTI尺寸和极点之间的明显差异。此外,我们使用数据集来训练和评估基准人格预测模型,在单个维度上获得介于67%和82%之间的宏观F1得分,对于精确或一次性的准确类型预测,其达到82%的准确性。这些结果令人鼓舞,并且可以与标准化测试的可靠性相媲美。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号