首页> 外文会议>International Conference on Data Science, Machine Learning and Applications >Grouping Users Through Pair Wise Sequence Alignment and Graph Traversal Based on Web Page Navigation Behaviour
【24h】

Grouping Users Through Pair Wise Sequence Alignment and Graph Traversal Based on Web Page Navigation Behaviour

机译:基于网页导航行为的配对明智序列比对和图遍历对用户进行分组

获取原文

摘要

The increasing number of online users endow with various opportunities for research in web mining. User grouping plays a major role in web personalization. Finding the similar user consortium gains more interest among the web researchers so that a customized and better environment users can be provided. This paper attempts on finding user consortiums based on their web page navigation pattern. The methodology first and foremost incorporates interpreting the input navigation sequence and then investigates the influence on graph traversal and the level of thresholding in user grouping. For each interpretation, a global pair-wise sequence alignment is carried out by considering the alignment between any two users web navigation sequences. Subsequently, based on the number of aligned and the unique number of pages between the users a similarity matrix is formulated. Then, based on the maximum value at each column and at each row as column thresholding and row thresholding similarity matrix is thresholded at different levels. After that, graph traversal is performed to identify the user groups. To assess the proposed methodology MSNBC dataset, a publicly available data is used. Jaccard similarity co-efficient is used to find the inter-group similarity. Then, the influence of thresholding and the threshold level was investigated. The results revealed that with the sorted input navigation sequence without redundancy the column thresholding at 0.75 level yielded the highest possible outcome in forming the user groups.
机译:越来越多的在线用户为Web挖掘研究提供了各种机会。用户分组在Web个性化中起主要作用。查找相似的用户联盟在Web研究人员中引起了更多兴趣,因此可以提供定制的和更好的环境用户。本文尝试根据用户联盟的网页导航模式查找他们。该方法最重要的是首先解释输入的导航序列,然后研究对图遍历的影响以及用户分组中阈值级别的影响。对于每种解释,通过考虑任意两个用户Web导航序列之间的比对来执行全局逐对序列比对。随后,基于对齐的数量和用户之间的唯一页面数,制定相似度矩阵。然后,基于每一列和每一行的最大值,将列阈值和行阈值相似度矩阵以不同级别进行阈值处理。之后,执行图遍历以识别用户组。为了评估建议的方法MSNBC数据集,使用了公开可用的数据。雅卡德相似系数用于查找组间相似度。然后,研究了阈值和阈值水平的影响。结果表明,在没有冗余的情况下使用排序的输入导航序列时,列阈值设置为0.75会在形成用户组时产生最高的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号