Filtering Noisy Web Data by Identifying and Leveraging Users' Contributions

机译：通过识别和利用用户的贡献来过滤嘈杂的Web数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we present several methods for collecting Web textual contents and filtering noisy data. We show that knowing which user publishes which contents can contribute to detecting noise. We begin by collecting data from two forums and from Twitter. For the forums, we extract the meaningful information from each discussion (texts of question and answers, IDs of users, date). For the Twitter dataset, we first detect tweets with very similar texts, which helps avoiding redundancy in further analysis. Also, this leads us to clusters of tweets that can be used in the same way as the forum discussions: they can be modeled by bipartite graphs. The analysis of nodes of the resulting graphs shows that network structure and content type (noisy or relevant) are not independent, so network studying can help in filtering noise.

机译：在本文中，我们提供了几种用于收集Web文本内容和过滤噪声数据的方法。我们表明了解哪个用户发布哪些内容可以有助于检测噪声。我们首先从两个论坛和Twitter收集数据。对于论坛，我们从每个讨论中提取有意义的信息（问答文本，用户的ID，日期）。对于Twitter DataSet，我们首先检测具有非常相似的文本的推文，这有助于避免进一步分析中的冗余。此外，这使我们能够以与论坛讨论相同的方式使用的推文集群：它们可以通过二角形图形建模。结果图的节点分析表明，网络结构和内容类型（嘈杂或相关）不是独立的，因此网络学习可以帮助过滤噪声。

著录项

来源
《International AAAI Conference on Weblogs and Social Media》|2012年||共4页
会议地点
作者
Alina Stoica;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词

相似文献

外文文献
中文文献
专利

1. Leveraging user-session data to support Web application testing [J] . Elbaum S., Rothermel G., Karre S., IEEE Transactions on Software Engineering . 2005,第3期

机译：利用用户会话数据来支持Web应用程序测试
2. Web accessibility: Filtering redundant and irrelevant information improves website usability for blind users [J] . Stéphanie Giraud, Pierre Thérouanne, Dirk D. Steiner International journal of human-computer studies . 2018,第期

机译：Web可访问性：过滤冗余和无关信息可提高盲用户的网站可用性
3. IDENTIFYING USER AND GROUP INFORMATION FROM COLLABORATIVE FILTERING DATASETS [J] . JOSEPHINE GRIFFITH, COLM ORIORDAN, HUMPHREY SORENSEN International Journal of Pattern Recognition and Artificial Intelligence . 2007,第2期

机译：从协作过滤数据集中识别用户和组信息
4. Filtering Noisy Web Data by Identifying and Leveraging Users' Contributions [C] . Alina Stoica International AAAI Conference on Weblogs and Social Media . 2012

机译：通过识别和利用用户的贡献来过滤嘈杂的Web数据
5. A collaborative filtering approach to predict web pages of interest from navigation patterns of past users within an academic website. [D] . Nkweteyim, Denis Lemongew. 2005

机译：一种协作过滤方法，可根据学术网站内过去用户的导航模式来预测感兴趣的网页。
6. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining [O] . S. Sadesh, R. C. Suganthe 2015

机译：在Web挖掘中对更新的用户行为配置文件上的查询结果进行有效过滤
7. User data distributed on the social web: how to identify users on different social systems and collecting data about them [O] . CARMAGNOLA F, OSBORNE F, I. TORRE 2010

机译：在社交网络上分发的用户数据：如何识别不同社交系统上的用户并收集有关他们的数据

Filtering Noisy Web Data by Identifying and Leveraging Users' Contributions

摘要

著录项

相似文献

相关主题

期刊订阅