首页> 外文会议>Database systems for advanced applications.;Part 1. >Searching for Quality Microblog Posts: Filtering and Ranking Based on Content Analysis and Implicit Links
【24h】

Searching for Quality Microblog Posts: Filtering and Ranking Based on Content Analysis and Implicit Links

机译:搜索高质量的微博帖子:基于内容分析和隐式链接的过滤和排名

获取原文
获取原文并翻译 | 示例

摘要

Today, social networking has become a popular web activity, with a large amount of information created by millions of people every day. However, the study on effective searching of such social information is still in its infancy. In this paper, we focus on Twitter, a rapidly growing microblogging platform, which provides a large amount, diversity and varying quality of content. In order to provide higher quality content (e.g. posts mentioning news, events, useful facts or well-formed opinions) when a user searches for tweets on Twitter, we propose a new method to filter and rank tweets according to their quality. In order to model the quality of tweets, we devise a new set of link-based features, in addition to content-based features. We examine the implicit links between tweets, URLs, hashtags and users, and then propose novel metrics to reflect the popularity as well as quality-based reputation of websites, hashtags and users. We then evaluate both the content-based and link-based features in terms of classification effectiveness and identify an optimal feature subset that achieves the best classification accuracy. A detailed evaluation of our filtering and ranking models shows that the optimal feature subset outperforms traditional bag-of-words representation, while requiring significantly less computational time and storage. Moreover, we demonstrate that the proposed metrics based on implicit links are effective for determining tweets' quality.
机译:如今,社交网络已成为一种流行的网络活动,每天都有成千上万的人创建大量信息。但是,关于有效搜索此类社会信息的研究仍处于起步阶段。在本文中,我们关注Twitter,这是一个快速增长的微博平台,可提供大量,多样化和变化的内容质量。为了在用户在Twitter上搜索推文时提供更高质量的内容(例如,提及新闻,事件,有用的事实或格式正确的意见的帖子),我们提出了一种根据推文的质量对推文进行过滤和排名的新方法。为了对推文的质量进行建模,除基于内容的功能外,我们还设计了一套基于链接的新功能。我们检查了推文,URL,主题标签和用户之间的隐式链接,然后提出了新颖的指标来反映网站,主题标签和用户的受欢迎程度以及基于质量的声誉。然后,我们根据分类效果评估基于内容的特征和基于链接的特征,并确定实现最佳分类精度的最佳特征子集。对我们的过滤和排序模型的详细评估表明,最佳特征子集的性能优于传统的词袋表示,同时所需的计算时间和存储量也大大减少。此外,我们证明了基于隐式链接的建议指标对于确定推文的质量是有效的。

著录项

  • 来源
  • 会议地点 Busan(KR);Busan(KR)
  • 作者单位

    Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China;

    Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China;

    Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP311.13;TP311.13;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号