首页> 外文学位 >Employing Topological Data Analysis on Social Networks Data to Improve Information Diffusion
【24h】

Employing Topological Data Analysis on Social Networks Data to Improve Information Diffusion

机译:利用社交网络数据的拓扑数据分析来改善信息扩散

获取原文
获取原文并翻译 | 示例

摘要

For the past decade, the number of users on social networks has grown tremendously from thousands in 2004 to billions by the end of 2015. On social networks, users create and propagate billions of pieces of information every day. The data can be in many forms (such as text, images, or videos). Due to the massive usage of social networks and availability of data, the field of social network analysis and mining has attracted many researchers from academia and industry to analyze social network data and explore various research opportunities (including information diffusion and influence measurement).;Information diffusion is defined as the way that information is spread on social networks; this can occur due to social influence. Influence is the ability affect others without direct commands. Influence on social networks can be observed through social interactions between users (such as retweet on Twitter, like on Instagram, or favorite on Flickr). In order to improve information diffusion, we measure the influence of users on social networks to predict influential users. The ability to predict the popularity of posts can improve information diffusion as well; posts become popular when they diffuse on social networks. However, measuring influence and predicting posts popularity can be challenging due to unstructured, big, noisy data. Therefore, social network mining and analysis techniques are essential for extracting meaningful information about influential users and popular posts.;For measuring the influence of users, we proposed a novel influence measurement that integrates both users' structural locations and characteristics on social networks, which then can be used to predict influential users on social networks. centrality analysis techniques are adapted to identify the users' structural locations. Centrality is used to identify the most important nodes within a graph; social networks can be represented as graphs (where nodes represent users and edges represent interactions between users), and centrality analysis can be adopted.;The second part of the work focuses on predicting the popularity of images on social networks over time. The effect of social context, image content and early popularity on image popularity using machine learning algorithms are analyzed. A new approach for image content is developed to represent the semantics of an image using its captions, called keyword vector. This approach is based on Word2vec (an unsupervised two-layer neural network that generates distributed numerical vectors to represent words in the vector space to detect similarity) and k-means (a popular clustering algorithm). However, machine learning algorithms do not address issues arising from the nature of social network data, noise and high dimensionality in data. Therefore, topological data analysis is adopted. It is a noble approach to extract meaningful information from high-dimensional data and is robust to noise. It is based on topology, which aims to study the geometric shape of data. In this thesis, we explore the feasibility of topological data analysis for mining social network data by addressing the problem of image popularity.;The proposed techniques are employed to datasets crawled from real-world social networks to examine the performance of each approach. The results for predicting the influential users outperforms existing measurements in terms of correlation. As for predicting the popularity of images on social networks, the results indicate that the proposed features provides a promising opportunity and exceeds the related work in terms of accuracy. Further exploration of these research topics can be used for a variety of real-world applications (including improving viral marketing, public awareness, political standings and charity work).
机译:在过去的十年中,社交网络上的用户数量已从2004年的数千人激增至2015年底的数十亿。在社交网络上,用户每天创建和传播数十亿条信息。数据可以采用多种形式(例如文本,图像或视频)。由于社交网络的大量使用和数据的可用性,社交网络分析和挖掘领域吸引了许多来自学术界和行业的研究人员来分析社交网络数据并探索各种研究机会(包括信息传播和影响度量)。传播被定义为在社交网络上传播信息的方式;这可能是由于社会影响而发生的。影响力是在没有直接命令的情况下影响他人的能力。可以通过用户之间的社交互动来观察对社交网络的影响(例如在Twitter上转发,在Instagram上转发或在Flickr上收藏)。为了改善信息传播,我们通过测量用户对社交网络的影响来预测有影响力的用户。预测职位受欢迎程度的能力也可以改善信息传播;帖子在社交网络上传播时变得很流行。但是,由于结构化,大而嘈杂的数据,衡量影响力和预测职位受欢迎程度可能会具有挑战性。因此,社交网络挖掘和分析技术对于提取有关有影响力的用户和热门帖子的有意义的信息至关重要。为了测量用户的影响力,我们提出了一种新颖的影响力测量方法,该方法将用户的结构位置和特征整合到社交网络中,然后可用于预测社交网络上有影响力的用户。中心性分析技术适用于识别用户的结构位置。中心性用于标识图中最重要的节点。社交网络可以用图表示(其中节点代表用户,边缘代表用户之间的交互),可以采用集中性分析。第二部分工作着眼于预测随着时间的推移社交网络上图像的流行度。使用机器学习算法分析了社交环境,图像内容和早期流行度对图像流行度的影响。开发了一种用于图像内容的新方法,以使用其标题(称为关键字向量)来表示图像的语义。这种方法基于Word2vec(一种无监督的两层神经网络,该网络生成分布式数值矢量来表示矢量空间中的单词以检测相似性)和k-means(一种流行的聚类算法)。但是,机器学习算法无法解决由社交网络数据的性质,噪声和数据的高维性引起的问题。因此,采用了拓扑数据分析。这是一种从高维数据中提取有意义信息的方法,并且对噪声具有鲁棒性。它基于拓扑,旨在研究数据的几何形状。本文通过解决图像流行度问题,探讨了拓扑数据分析在挖掘社交网络数据中的可行性。提出的技术被用于从现实世界社交网络中爬取的数据集,以检验每种方法的性能。就相关性而言,用于预测有影响力的用户的结果要优于现有的度量。至于预测图像在社交网络上的受欢迎程度,结果表明所提出的功能提供了一个有希望的机会,并且在准确性方面超过了相关工作。这些研究主题的进一步探索可用于各种实际应用(包括提高病毒式营销,公众意识,政治地位和慈善工作)。

著录项

  • 作者

    Almgren, Khaled.;

  • 作者单位

    University of Bridgeport.;

  • 授予单位 University of Bridgeport.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 145 p.
  • 总页数 145
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 农业化学;
  • 关键词

  • 入库时间 2022-08-17 11:53:31

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号