...
首页> 外文期刊>Foundations of computing and decision sciences >AN EMPIRICAL STUDY OF CONTEXT BASED SEQUENTIAL PATTERN MINING ALGORITHMS EFFICIENCY
【24h】

AN EMPIRICAL STUDY OF CONTEXT BASED SEQUENTIAL PATTERN MINING ALGORITHMS EFFICIENCY

机译:基于上下文的顺序模式挖掘算法效率的实证研究

获取原文
获取原文并翻译 | 示例

摘要

Methods of patterns detection in the sets of data are useful and demanded tools in a knowledge discovery process. The problem of searching patterns in set of sequences is named Sequential Patterns Mining. It can be defined as a way of finding frequent subsequences in the sequences database. The patterns selection procedure may be simply understood. Every subsequence must be enclosed in the required number of sequences from the database at least to become a pattern. The number of a pattern enclosing sequences is called a pattern support. The process of finding patterns may look trivial but its efficient solution is not. The efficiency plays a crucial role if the required support is lowered. The number of mined patterns may grow exponentially. Moreover, the situation may change if the problem of Sequential Patterns Mining will be extended further. In the classic definition the sequence is a list of ordered elements containing only non-empty sets of items. The Context Based Sequential Patterns Mining adds uniform and multi-attribute contexts (vectors) to the elements of the sequence and the sequence itself. Introducing contexts significantly enlarges the problem search space. However, it brings some additional occasions to constrain the mining process, too. This enhancement requires new algorithms. Traditional ones are not able to cope with non-nominal data directly. Algorithms derived straightly from traditional algorithms were verified to be inefficient. This study evaluates efficiency of novel ContextMapping and ContextMappingHeuristic algorithms. These innovative algorithms are designed to solve the problem of Context Based Sequential Pattern Mining. This study answers in what scope the algorithms parameterization impacts on mining costs and accuracy. It also refers the modified problem to the traditional one pointing at the common and uncommon properties and drawing perspective for further research.
机译:在知识发现过程中,数据集中的模式检测方法是有用且必不可少的工具。在序列集中搜索模式的问题被称为顺序模式挖掘。可以将其定义为在序列数据库中查找频繁子序列的方法。模式选择过程可以简单地理解。必须将每个子序列包含在数据库中所需数量的序列中,至少要成为一种模式。模式包围序列的数量称为模式支持。查找模式的过程看似微不足道,但其有效的解决方案却并非如此。如果降低所需的支持,效率将发挥关键作用。开采模式的数量可能呈指数增长。此外,如果将进一步扩展顺序模式挖掘的问题,情况可能会改变。在经典定义中,序列是仅包含非空项目集的有序元素的列表。基于上下文的顺序模式挖掘为序列的元素和序列本身添加了统一和多属性的上下文(向量)。引入上下文会大大扩大问题搜索空间。但是,它也带来了其他一些机会来限制采矿过程。此增强功能需要新的算法。传统的无法直接处理非标称数据。直接从传统算法派生的算法被证明是无效的。这项研究评估新型ContextMapping和ContextMappingHeuristic算法的效率。这些创新的算法旨在解决基于上下文的顺序模式挖掘问题。这项研究回答了算法参数化在何种范围内影响采矿成本和准确性。它还将修改后的问题引向传统问题,以指出常见和不常见的特性并从图纸的角度进行进一步研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号