首页> 外文学位 >Swarm intellilgence for clustering dynamic data sets for web usage mining and personalization.
【24h】

Swarm intellilgence for clustering dynamic data sets for web usage mining and personalization.

机译:群智能化,可对动态数据集进行聚类,以进行Web使用情况挖掘和个性化。

获取原文
获取原文并翻译 | 示例

摘要

Swarm Intelligence (SI) techniques were inspired by bee swarms, ant colonies, and most recently, bird flocks. Flock-based Swarm Intelligence (FSI) has several unique features, namely decentralized control, collaborative learning, high exploration ability, and inspiration from "dynamic social" behavior. Thus FSI offers a natural choice for modeling dynamic social data and solving problems in such domains. One particular case of dynamic social data is online/web usage data which is rich in information about user activities, interests and choices.;To support a better understanding of the online user activities, we developed clustering algorithms that address two challenges of mining online usage data: the need for scalability to large data and the need to adapt clustersing to dynamic data sets. To address the scalability challenge, we developed new clustering algorithms using a hybridization of traditional Flock-based clustering with faster K-Means based partitional clustering algorithms. We tested our algorithms on synthetic data, real UCI Machine Learning repository benchmark data, and a data set consisting of real Web user sessions. Having linear complexity with respect to the number of data records, the resulting algorithms are considerably faster than traditional Flock-based clustering (which has quadratic complexity). Moreover, our experiments demonstrate that scalability was gained without sacrificing quality. To address the challenge of adapting to dynamic data, we developed a dynamic clustering algorithm that can handle the following dynamic properties of online usage data: (1) New data records can be added at any time (example: a new user is added on the site); (2) Existing data records can be removed at any time. For example, an existing user of the site, who no longer subscribes to a service, or who is terminated because of violating policies; (3) New parts of existing records can arrive at any time or old parts of the existing data record can change. The user's record can change as a result of additional activity such as purchasing new products, returning a product, rating new products, or modifying the existing rating of a product. We tested our dynamic clustering algorithm on synthetic dynamic data, and on a data set consisting of real online user ratings for movies. Our algorithm was shown to handle the dynamic nature of data without sacrificing quality compared to a traditional Flock-based clustering algorithm that is re-run from scratch with each change in the data.;To support reducing online information overload, we developed a Flock-based recommender system to predict the interests of users, in particular focusing on collaborative filtering or social recommender systems. Our Flock-based recommender algorithm (FlockRecom) iteratively adjusts the position and speed of dynamic flocks of agents, such that each agent represents a user, on a visualization panel. Then it generates the top-n recommendations for a user based on the ratings of the users that are represented by its neighboring agents. Our recommendation system was tested on a real data set consisting of online user ratings for a set of jokes, and compared to traditional user-based Collaborative Filtering (CF). Our results demonstrated that our recommender system starts performing at the same level of quality as traditional CF, and then, with more iterations for exploration, surpasses CF's recommendation quality, in terms of precision and recall. Another unique advantage of our recommendation system compared to traditional CF is its ability to generate more variety or diversity in the set of recommended items.;Our contributions advance the state of the art in Flock-based SI for clustering and making predictions in dynamic Web usage data, and therefore have an impact on improving the quality of online services.;This natural analogy between SI and social behavior is the main motivation for the topic of investigation in this dissertation, with a focus on Flock based systems which have not been well investigated for this purpose. More specifically, we investigate the use of flock-based SI to solve two related and challenging problems by developing algorithms that form critical building blocks of intelligent personalized websites, namely, (i) providing a better understanding of the online users and their activities or interests, for example using clustering techniques that can discover the groups that are hidden within the data; and (ii) reducing information overload by providing guidance to the users on websites and services, typically by using web personalization techniques, such as recommender systems. Recommender systems aim to recommend items that will be potentially liked by a user.
机译:群智能(SI)技术的灵感来自蜂群,蚁群,以及最近的鸟群。基于群的群体智能(FSI)具有几个独特的功能,即分散控制,协作学习,高探索能力和“动态社交”行为的启发。因此,FSI为建模动态社交数据和解决此类领域的问题提供了自然的选择。动态社交数据的一种特殊情况是在线/网络使用情况数据,其中包含有关用户活动,兴趣和选择的信息。为了支持对在线用户活动的更好理解,我们开发了聚类算法来应对挖掘在线使用情况的两个挑战数据:对大数据的可伸缩性的需求,以及使集群适应动态数据集的需求。为了解决可伸缩性挑战,我们使用传统的基于Flock的群集与基于K-Means的更快分区群集算法的混合技术,开发了新的群集算法。我们在合成数据,真实的UCI机器学习存储库基准测试数据以及由真实的Web用户会话组成的数据集上测试了算法。就数据记录的数量而言,它具有线性复杂度,因此所产生的算法比传统的基于Flock的聚类(具有二次复杂度)要快得多。此外,我们的实验表明,在不牺牲质量的前提下获得了可扩展性。为了解决适应动态数据的挑战,我们开发了一种动态聚类算法,可以处理在线使用数据的以下动态属性:(1)可以随时添加新数据记录(例如:现场); (2)现有数据记录可以随时删除。例如,站点的现有用户不再订阅服务,或者由于违反政策而被终止; (3)现有记录的新部分可以随时到达,或者现有数据记录的旧部分可以更改。用户记录可能会由于其他活动而发生更改,例如购买新产品,退货,对新产品进行评级或修改产品的现有评级。我们在合成动态数据以及由电影的真实在线用户评分组成的数据集上测试了动态聚类算法。与传统的基于Flock的聚类算法相比,该算法可以处理数据的动态性质,而不会牺牲质量。传统的基于Flock的聚类算法会随着数据的每次更改从头开始重新运行。基于推荐者的系统来预测用户的兴趣,尤其着重于协作过滤或社交推荐器系统。我们基于Flock的推荐算法(FlockRecom)在可视化面板上迭代地调整代理动态群的位置和速度,以使每个代理代表一个用户。然后,它基于由其相邻代理代表的用户的评级为用户生成前n个建议。我们的推荐系统在包含一组笑话的在线用户评分的真实数据集上进行了测试,并与传统的基于用户的协作过滤(CF)进行了比较。我们的结果表明,我们的推荐器系统开始以与传统CF相同的质量运行,然后通过更多的迭代探索,在准确性和查全率方面超过了CF的推荐质量。与传统CF相比,我们的推荐系统的另一个独特优势是它能够在推荐项目集中产生更多的多样性或多样性。我们的贡献促进了基于Flock的SI的最新技术,从而可以在动态Web使用中进行聚类和做出预测数据,因此对提高在线服务质量有影响。; SI与社会行为之间的这种自然类比是本文研究主题的主要动机,重点是尚未充分研究的基于Flock的系统以此目的。更具体地说,我们通过开发形成智能个性化网站的关键构建块的算法来研究基于群体的SI解决两个相关且具有挑战性的问题,即(i)提供对在线用户及其活动或兴趣的更好理解,例如使用可以发现数据中隐藏的组的聚类技术; (ii)通常通过使用Web个性化技术(例如推荐系统)为用户提供有关网站和服务的指导,以减少信息过载。推荐系统旨在推荐用户可能会喜欢的商品。

著录项

  • 作者

    Saka, Esin.;

  • 作者单位

    University of Louisville.;

  • 授予单位 University of Louisville.;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 172 p.
  • 总页数 172
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号