首页> 外文期刊>Data & Knowledge Engineering >Log sequence clustering for workflow mining in multi-workflow systems
【24h】

Log sequence clustering for workflow mining in multi-workflow systems

机译:日志序列聚类,用于多工作流系统中的工作流挖掘

获取原文
获取原文并翻译 | 示例
       

摘要

Current workflow mining efforts aim to discover process knowledge from user-system interaction logs and represent it as high-level workflow models. They assume there is one single workflow model in a system, or rely on the information that can explicitly link each log sequence to the underlying workflow model. Such assumptions may not be applicable to multi-workflow systems where the instances of different workflow models are mixed together without being differentiated. To address this issue, this paper proposes to apply sequence clustering methods to group similar log sequences together. Each sequence cluster corresponds to a workflow model and the log sequences in the cluster are the corresponding instances. This paper investigates different similarity measures, including structure-based and user-based, as well as different clustering algorithms, including one-side clustering and co-clustering. In order to incorporate user factors into sequence clustering, which is novel to the current sequence clustering methods, this paper proposes to model User Behavior Patterns (UBPs) as probabilistic distributions over sequences and learn it from the event log. We represent a UBP as a Probabilistic Suffix Tree and use it to measure sequence similarity. The co-clustering method leverages the dyad relationship between UBPs and log sequences to improve the clustering accuracy. An experimental study has been conducted and the result indicates that user-based methods outperform structure-based methods in terms of accuracy and they are more effective on dealing with noises in the log and the increase of log size. The UBP-sequence co-clustering method achieves the best performance which indicates the effectiveness of incorporating user factors and applying co-clustering.
机译:当前的工作流挖掘工作旨在从用户系统交互日志中发现过程知识,并将其表示为高级工作流模型。他们假设系统中只有一个工作流程模型,或者依赖可以将每个日志序列显式链接到基础工作流程模型的信息。这样的假设可能不适用于不同工作流模型的实例混合在一起而没有区别的多工作流系统。为了解决这个问题,本文提出应用序列聚类方法将相似的对数序列组合在一起。每个序列集群都对应于工作流模型,并且集群中的日志序列是对应的实例。本文研究了不同的相似性度量,包括基于结构和基于用户的相似性,以及不同的聚类算法,包括单侧聚类和共聚。为了将用户因素纳入序列聚类(这是当前序列聚类方法的新颖方法),本文提出将用户行为模式(UBP)建模为序列上的概率分布,并从事件日志中学习。我们将UBP表示为概率后缀树,并用它来测量序列相似性。共聚方法利用UBP与日志序列之间的对偶关系来提高聚类精度。进行了一项实验研究,结果表明,基于用户的方法在准确性方面优于基于结构的方法,它们在处理原木中的噪声和原木尺寸增加方面更有效。 UBP序列共聚方法实现了最佳性能,表明结合用户因素和应用共聚的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号