【24h】

Language-Agnostic Twitter Bot Detection

机译:与语言无关的Twitter Bot检测

获取原文

摘要

In this paper we address the problem of detecting Twitter bots. We analyze a dataset of 8385 Twitter accounts and their tweets consisting of both humans and different kinds of bots. We use this data to train machine learning classifiers that distinguish between real and bot accounts. We identify features that are easy to extract while still providing good results. We analyze different feature groups based on account specific, tweet specific and behavioral specific features and measure their performance compared to other state of the art bot detection methods. For easy future portability of our work we focus on language-agnostic features. With Ad-aBoost, the best performing classifier, we achieve an accuracy of 0.988 and an AUC of 0.995. As the creation of good training data in machine learning is often difficult - especially in the domain of Twitter bot detection - we additionally analyze to what extent smaller amounts of training data lead to useful results by reviewing cross-validated learning curves. Our results indicate that using few but expressive features already has a good practical benefit for bot detection, especially if only a small amount of training data is available.
机译:在本文中,我们解决了检测Twitter机器人的问题。我们分析了8385个Twitter帐户及其推文的数据集,这些推文包括人类和不同种类的机器人。我们使用这些数据来训练区分真实账户和机器人账户的机器学习分类器。我们确定易于提取的特征,同时仍能提供良好的效果。我们根据帐户特定,tweet特定和行为特定的特征来分析不同的特征组,并与其他先进的bot检测方法相比,评估其性能。为了将来可以轻松移植我们的工作,我们专注于与语言无关的功能。使用性能最佳的分类器Ad-aBoost,我们可以实现0.988的准确度和0.995的AUC。由于在机器学习中创建良好的训练数据通常很困难-尤其是在Twitter机器人检测领域-我们还通过查看经过交叉验证的学习曲线来分析更少量的训练数据在多大程度上导致有用的结果。我们的结果表明,使用很少但富有表现力的功能已经对机器人检测具有良好的实践意义,尤其是在只有少量训练数据的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号