首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams
【24h】

Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams

机译:数据流中基于老化的多项式朴素贝叶斯分类器

获取原文

摘要

The long-term analysis of opinionated streams requires algorithms that predict the polarity of opinionated documents, while adapting to different forms of concept drift: the class distribution may change but also the vocabulary used by the document authors may change. One of the key properties of a stream classifier is adaptation to concept drifts and shifts; this is typically achieved through ageing of the data. Surprisingly, for one of the most popular classifiers, Multinomial Naive Bayes (MNB), no ageing has been considered thus far. MNB is particularly appropriate for opinionated streams, because it allows the seamless adjustment of word probabilities, as new words appear for the first time. However, to adapt properly to drift, MNB must also be extended to take the age of documents and words into account. In this study, we incorporate ageing into the learning process of MNB, by introducing the notion of fading for words, on the basis of the recency of the documents containing them. We propose two fading versions, gradual fading and aggressive fading, of which the latter discards old data at a faster pace. Our experiments with Twitter data show that the ageing based MNBs outperform the standard accumulative MNB approach and manage to recover very fast in times of change. We experiment with different data granularities in the stream and different data ageing degrees and we show how they "work together" towards adaptation to change.
机译:对有意见的数据流进行长期分析需要算法,这些算法可以预测有意见的文档的极性,同时适应不同形式的概念漂移:类的分布可能会发生变化,但文档作者所使用的词汇可能会发生变化。流分类器的关键特性之一是适应概念的漂移和偏移。这通常是通过老化数据来实现的。令人惊讶的是,对于最流行的分类器之一,即朴素贝叶斯(MNB),到目前为止,还没有考虑老化。 MNB特别适用于自以为是的流,因为它可以无缝调整单词概率,因为新单词是首次出现。但是,为了适当地适应漂移,也必须扩展MNB以考虑文档和单词的使用期限。在这项研究中,我们基于单词的渐近性,通过引入单词渐隐的概念,将衰老纳入了MNB的学习过程。我们提出了两种衰落版本,即渐进衰落和主动衰落,后者会以更快的速度丢弃旧数据。我们使用Twitter数据进行的实验表明,基于老化的MNB的性能优于标准的累积MNB方法,并且能够在变化时快速恢复。我们对流中不同的数据粒度和不同的数据老化程度进行了试验,并展示了它们如何“协同工作”以适应变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号