首页> 外文会议>IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining >5 Sources of Clickbaits You Should Know! Using Synthetic Clickbaits to Improve Prediction and Distinguish between Bot-Generated and Human-Written Headlines
【24h】

5 Sources of Clickbaits You Should Know! Using Synthetic Clickbaits to Improve Prediction and Distinguish between Bot-Generated and Human-Written Headlines

机译:5您应该知道的ClickBaits源!使用综合性ClickBaits来改进预测并区分机器人生成和人为的头条新闻

获取原文

摘要

Clickbait is an attractive yet misleading headline that lures readers to commit click-conversion. Development of robust clickbait detection models has been, however, hampered due to the shortage of high-quality labeled training samples. To overcome this challenge, we investigate how to exploit human-written and machine-generated synthetic clickbaits. We first ask crowdworkers and journalism students to generate clickbaity news headlines. Second, we utilize deep generative models to generate clickbaity headlines. Through empirical evaluations, we demonstrate that synthetic clickbaits by human entities and deep generative models are consistently useful in improving the accuracy of various prediction models, by as much as 14.5% in AUC, across two real datasets and different types of algorithms. Especially, we observe an improvement in accuracy, up to 8.5% in AUC, even for top-ranked clickbait detectors from Clickbait Challenge 2017. Our study proposes a novel direction to address the shortage of labeled training data, one of fundamental bottlenecks in supervised learning, by means of synthetic training data with reinforced domain knowledge. It also provides a solution for distinguishing between bot-generated and human-written clickbaits, thus aiding the work of moderators and better alerting news consumers.
机译:ClickBait是一个有吸引力但误导性的标题,诱使读者提交点击转换。然而,由于高质量标记的训练样本的短缺,稳健的点击性检测模型的开发已经阻碍了。为了克服这一挑战,我们调查如何利用人性化和机器生成的合成ClickBAits。我们首先要求人群公司和新闻学生生成ClickBaity新闻头条新闻。其次,我们利用深生成的模型来生成ClickBaity头条新闻。通过经验评估,我们展示了人体实体和深度生成模型的合成ClickBaits在两个实际数据集和不同类型的算法中,通过提高各种预测模型的准确性,在AUC中的准确性和不同类型的算法中的多达14.5%。特别是,我们的准确性提高,AUC的准确性高达8.5%,即使是来自点击条目挑战的探测器,也是来自点击条款的探测器2017。我们的研究提出了一种新颖的方向来解决标记培训数据的短缺,是监督学习的基本瓶颈之一,通过具有加强域知识的合成训练数据。它还提供了区分机器人生成和人写的ClickBATS的解决方案,从而帮助主持人的工作和更好的警报新闻消费者。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号