首页> 外文期刊>Knowledge and information systems >Tools and approaches for topic detection from Twitter streams: survey
【24h】

Tools and approaches for topic detection from Twitter streams: survey

机译:Twitter Streams主题检测的工具和方法:调查

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Detecting topics from Twitter streams has become an important task as it is used in various fields including natural disaster warning, users opinion assessment, and traffic prediction. In this article, we outline different types of topic detection techniques and evaluate their performance. We categorize the topic detection techniques into five categories which are clustering, frequent pattern mining, Exemplar-based, matrix factorization, and probabilistic models. For clustering techniques, we discuss and evaluate nine different techniques which are sequential k-means, spherical k-means, Kernel k-means, scalable Kernel k-means, incremental batch k-means, DBSCAN, spectral clustering, document pivot clustering, and Bngram. Moreover, for matrix factorization techniques, we analyze five different techniques which are sequential Latent Semantic Indexing (LSI), stochastic LSI, Alternating Least Squares (ALS), Rank-one Downdate (R1D), and Column Subset Selection (CSS). Additionally, we evaluate several other techniques in the frequent pattern mining, Exemplar-based, and probabilistic model categories. Results on three Twitter datasets show that Soft Frequent Pattern Mining (SFM) and Bngram achieve the best term precision, while CSS achieves the best term recall and topic recall in most of the cases. Moreover, Exemplar-based topic detection obtains a good balance between the term recall and term precision, while achieving a good topic recall and running time.
机译:检测到Twitter流的主题已成为一个重要的任务,因为它用于包括自然灾害警告,用户意见评估和交通预测的各种领域。在本文中,我们概述了不同类型的主题检测技​​术并评估其性能。我们将主题检测技​​术分为为聚类,频繁模式挖掘,基于示例,矩阵分解和概率模型的五个类别。对于聚类技术,我们讨论和评估九种不同的技术,这是顺序k均值,球面K均值,核K均值,可伸缩内核K均值,增量批量k均值,dbscan,谱聚类,文档枢轴聚类,以及Bngram。此外,对于矩阵分子化技术,我们分析了五种不同的技术,该技术是顺序潜在语义索引(LSI),随机LSI,交流最小二乘(ALS),秩一滴度(R1D)和列子集选择(CSS)。此外,我们在频繁模式挖掘,基于示例和概率模型类别中评估了几种其他技术。结果三个Twitter数据集显示,软频繁模式挖掘(SFM)和Bngram实现了最佳精度,而CSS则在大多数情况下实现最佳术语回忆和主题回忆。此外,基于示例的主题检测在术语召回和术语精度之间获得了良好的平衡,同时实现了良好的主题召回和运行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号