【24h】

Search result clustering for Thai Twitter based on Suffix Tree Clustering

机译:基于后缀树聚类的泰国Twitter搜索结果聚类

获取原文
获取原文并翻译 | 示例

摘要

Today Twitter has become a popular online medium for posting and sharing news and events. Generally, many Twitter posts or “tweets” refer to the same topics or events. Searching on Twitter could return a long list of search results. To solve the problem, we propose an approach for clustering the Twitter search results based on the Suffix Tree Clustering (STC) algorithm. However, two main drawbacks of original STC are some of the returned cluster labels are unmeaningful and it is unable to create hierarchical structure. In this paper, we present a new approach called Suffix Tree Clustering with Label Merging (STC-LM). The key idea of the STC-LM is to merge partially overlapped cluster labels and then create two-level label structure. We performed experiments by using Thai Twitter posts from 12 topics such as flooding, traffic and entertainment. The performance based on the F1 measure is equal to 70%, an improvement of 9% from the baseline method.
机译:今天,Twitter已成为发布和共享新闻和事件的流行在线媒体。通常,许多Twitter帖子或“ tweets”是指相同的主题或事件。在Twitter上搜索可能会返回一长串搜索结果。为了解决该问题,我们提出了一种基于后缀树聚类(STC)算法对Twitter搜索结果进行聚类的方法。但是,原始STC的两个主要缺点是返回的某些群集标签没有意​​义,并且无法创建层次结构。在本文中,我们提出了一种新方法,称为带有标签合并的后缀树聚类(STC-LM)。 STC-LM的关键思想是合并部分重叠的群集标签,然后创建两级标签结构。我们使用来自洪水,交通和娱乐等12个主题的泰国Twitter帖子进行了实验。基于F1量度的性能等于70%,比基准方法提高了9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号