【24h】

Unsupervised Construction of Topic-Based Twitter Lists

机译:无监督的基于主题的Twitter列表的构建

获取原文

摘要

The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.
机译:Twitter列表功能已于2009年底启动,并启用创建包含Twitter用户的策划组。每个用户都可以是列表作者,并决定其他用户添加到列表的基础。最受欢迎的列表是与主题相关联的列表。 Twitter列表可用作强大的组织工具,但其广泛的采用受到限制。两个主要障碍是初始设置时间和持续策择努力。在本文中,我们试图通过应用无监督的聚类算法来解决基于主题的推特列表来解决第一个问题。我们将K-Means和Affinity传播(AP)视为聚类算法,并使用两个文档表示技术评估这些算法。所选择的表示技术是流行的术语频率 - 逆文档频率(TF-IDF)和潜在的Dirichlet分配(LDA)主题模型。我们使用五种已知的相似度量来计算聚类算法的相似性,这些措施已广泛在文本域中使用。调整后的归一化信息距离(ANID)用于比较K-Milite和亲和力传播所产生的聚类结果。我们发现,仔细选择相似度测量,与LDA主题模型组合可以为用户提供列表创建的明智起点。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号