首页> 外文会议>Advances in information retrieval. >Detection of News Feeds Items Appropriate for Children
【24h】

Detection of News Feeds Items Appropriate for Children

机译:检测适合儿童的新闻提要项目

获取原文
获取原文并翻译 | 示例

摘要

Identifying child-appropriate web content is an important yet difficult classification task. This novel task is characterised by attempting to determine age/child appropriateness (which is not necessarily topic-based), despite the presence of unbalanced class sizes and the lack of quality training data with human judgements of appropriateness. Classification of feeds, a subset of web content, presents further challenges due to their temporal nature and short document format. In this paper, we discuss these challenges and present baseline results for this task through an empirical study that classifies incoming news stories as appropriate (or not) for children. We show that while the naive Bayes approach produces a higher AUC it is vulnerable to the imbalanced data problem, and that support vector machine provides a more robust overall solution. Our research shows that classifying children's content is a non-trivial task that has greater complexities than standard text based classification. While the F-score values are consistent with other research examining age-appropriate text classification, we introduce a new problem with a new dataset.
机译:识别适合儿童的Web内容是一项重要但困难的分类任务。尽管存在不均衡的班级规模和缺乏人类适当性判断的高质量培训数据,但这项新颖的任务的特点是尝试确定年龄/儿童的适当性(不一定基于主题)。提要(Web内容的子集)的分类由于其时间性质和简短的文档格式而带来了进一步的挑战。在本文中,我们讨论了这些挑战,并通过一项实证研究为该任务提供了基线结果,该实证研究将传入的新闻报道分类为适合(或不适合)儿童。我们表明,尽管朴素的贝叶斯方法产生的AUC较高,但它很容易受到数据不平衡问题的影响,并且支持向量机提供了更可靠的整体解决方案。我们的研究表明,对儿童内容进行分类是一项艰巨的任务,比基于标准文本的分类要复杂得多。尽管F分数与其他研究适合年龄的文本分类的研究一致,但我们引入了新数据集的新问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号