首页> 外文会议>IEEE International Conference on Big Data >Detecting polarization in ratings: An automated pipeline and a preliminary quantification on several benchmark data sets

【24h】

Detecting polarization in ratings: An automated pipeline and a preliminary quantification on several benchmark data sets

机译：检测等级中的两极分化：自动建立管道并对几个基准数据集进行初步量化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Personalized recommender systems are becoming increasingly relevant and important in the study of polarization and bias, given their widespread use in filtering information spaces. Polarization is a social phenomenon, with serious consequences, in real-life, particularly on social media. Thus it is important to understand how machine learning algorithms, especially recommender systems, behave in polarized environments. In this paper, we study polarization within the context of the users' interactions with a space of items and how this affects recommender systems. We first formalize the concept of polarization based on item ratings and then relate it to the item reviews to investigate any potential correlation. We then propose a domain independent data science pipeline to automatically detect polarization using the ratings rather than the typical properties used to detect polarization, such as item's content or social network topology. We perform an extensive comparison of polarization measures on several benchmark data sets and show that our polarization detection framework can detect different degrees of polarization and outperforms existing measures in capturing an intuitive notion of polarization. Our work is an essential step toward quantifying and detecting polarization in ongoing ratings and in benchmark data sets, and to this end, we use our developed polarization detection pipeline to compute the polarization prevalence of several benchmark data sets. It is our hope that this work will contribute to supporting future research in the emerging topic of designing and studying the behavior of recommender systems in polarized environments.

机译：鉴于个性化推荐系统在过滤信息空间中的广泛应用，在极化和偏向的研究中，它们变得越来越重要和重要。极化是一种社会现象，在现实生活中，尤其是在社交媒体上，会带来严重后果。因此，重要的是要了解机器学习算法，特别是推荐系统在极化环境中的行为。在本文中，我们研究了用户与项目空间交互作用下的两极分化及其对推荐系统的影响。我们首先根据项目评分将极化的概念形式化，然后将其与项目评论相关联，以调查任何潜在的相关性。然后，我们提出了一个独立于域的数据科学管道，以使用评级自动检测极化，而不是使用用于检测极化的典型属性（例如项目的内容或社交网络拓扑）自动检测极化。我们在几个基准数据集上对偏振测量进行了广泛的比较，并表明我们的偏振检测框架可以检测不同程度的偏振，并且在捕获直观的偏振概念方面优于现有测量。我们的工作是朝着量化和检测正在进行的等级和基准数据集中的极化的重要步骤，为此，我们使用我们开发的极化检测管线来计算多个基准数据集的极化发生率。我们希望这项工作将有助于支持在极化环境中设计和研究推荐系统行为这一新兴主题方面的未来研究。

著录项

来源
《IEEE International Conference on Big Data》|2017年|2682-2690|共9页
会议地点
作者
Mahsa Badami; Olfa Nasraoui; Welong Sun; Patrick Shafto;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Recommender systems; Feature extraction; Histograms; Pipelines; Social network services; Benchmark testing; Motion pictures;

机译：推荐系统;特征提取;直方图;管道;社交网络服务;基准测试;运动图片;

相似文献

外文文献
中文文献
专利

1. A benchmarking of pipelines for detecting ncRNAs from RNA-Seq data [J] . Sebastiano Di Bella, Alessandro La Ferlita, Giovanni Carapezza, Briefings in bioinformatics . 2020,第6期

机译：从RNA-SEQ数据检测NCRNA的管道基准
2. Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets [J] . Perscheid Cindy BMC Bioinformatics . 2021,第1期

机译：容纳：促进基于知识的基于特征选择方法的实现和自动基准测试基因表达数据集
3. Impact of benchmark data set topology on the validation of virtual screening methods: Exploration and quantification by spatial statistics [J] . Rohrer SG, Baumann K Journal of chemical information and modeling . 2008,第4期

机译：基准数据集拓扑对虚拟筛选方法验证的影响：通过空间统计数据进行探索和量化
4. Detecting polarization in ratings: An automated pipeline and a preliminary quantification on several benchmark data sets [C] . Mahsa Badami, Olfa Nasraoui, Welong Sun, IEEE International Conference on Big Data . 2017

机译：检测额定值中的极化：自动化管道和几个基准数据集的初步量化
5. Application of Statistical and Machine Learning Techniques to Detect Rare Events in High Frequency Financial Data And Assess Corporate Credit Rating [D] . Golbayani, Parisa . 2019

机译：统计和机器学习技术在高频财务数据中检测稀有事件及评估企业信用评级
6. Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets [O] . Cindy Perscheid 2021

机译：容纳：促进基于知识的特征选择方法的实现和自动基准测试基因表达数据集
7. LTQ-iQuant: A freely available software pipeline for automated and accurate protein quantification of isobaric tagged peptide data from LTQ instruments [O] . Getiria Onsongo, Matthew D. Stone, Susan K. Van Riper, 2010

机译：LTQ-IQUANT：自动提供的软件管道，用于来自LTQ仪器的自动化和准确的蛋白质定量，其异教标记肽数据

Detecting polarization in ratings: An automated pipeline and a preliminary quantification on several benchmark data sets

摘要

著录项

相似文献

相关主题

期刊订阅