首页> 外文期刊>International journal of computer science and network security >Automatic Detection of News Articles of Interest to Regional Communities
【24h】

Automatic Detection of News Articles of Interest to Regional Communities

机译:自动检测区域社区感兴趣的新闻文章

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we devise an approach for identifying and classifying contents of interest related to geographic communities from news articles streams. We first conduct a short study on related works, and then present our approach, which consists in 1) filtering out contents irrelevant to communities and 2) classifying the remaining relevant news articles. Using a confidence threshold, the filtering and classification tasks can be performed in one pass using the weights learned by the same algorithm. We use Bayesian text classification, and because of important empiric class imbalance in Web-crawled corpora, we test several approaches: Naive Bayes, Complementary Naive Bayes, use of {1,2,3}-Grams, and use of oversampling. We find out in our testing experiment on Japanese prefectures that 3-gram CNB with oversampling is the most effective approach in terms of precision, while retaining acceptable training time and testing time.
机译:在本文中,我们设计了一种从新闻流中识别和分类与地理社区相关的兴趣内容的方法。我们首先对相关作品进行简短研究,然后介绍我们的方法,该方法包括:1)过滤掉与社区无关的内容,以及2)对其余相关新闻文章进行分类。使用置信度阈值,可以使用同一算法学习到的权重来一次执行过滤和分类任务。我们使用贝叶斯文本分类,并且由于Web爬取的语料库中重要的经验类不平衡,我们测试了几种方法:朴素贝叶斯,互补朴素贝叶斯,使用{1,2,3} -Grams以及使用过采样。我们在对日本县进行的测试实验中发现,在保持可接受的培训时间和测试时间的同时,过采样的3克CNB是最有效的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号