首页> 外文期刊>Fuzzy sets and systems >Developing an efficient knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence databases
【24h】

Developing an efficient knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence databases

机译:开发有效的知识发现模型以挖掘序列数据库中的模糊多级顺序模式

获取原文
获取原文并翻译 | 示例

摘要

Sequential pattern mining from sequence databases has been recognized as an important data mining problem with various applications. Items in a sequence database can be organized into a concept hierarchy according to taxonomy. Based on the hierarchy, sequential patterns can be found not only at the leaf nodes (individual items) of the hierarchy, but also at higher levels of the hierarchy; this is called multiple-level sequential pattern mining. In previous research, taxonomies based on crisp relationships between any two disjointed levels, however, cannot handle the uncertainties and fuzziness in real life. For example, Tomatoes could be classified into the Fruit category, but could be also regarded as the Vegetable category. To deal with the fuzzy nature of taxonomy, Chen and Huang developed a novel knowledge discovering model to mine fuzzy multi-level sequential patterns, where the relationships from one level to another can be represented by a value between 0 and 1. In their work, a generalized sequential patterns (GSP)-like algorithm was developed to find fuzzy multi-level sequential patterns. This algorithm, however, faces a difficult problem since the mining process may have to generate and examine a huge set of combinatorial subsequences and requires multiple scans of the database. In this paper, we propose a new efficient algorithm to mine this type of pattern based on the divide-and-conquer strategy. In addition, another efficient algorithm is developed to discover fuzzy cross-level sequential patterns. Since the proposed algorithm greatly reduces the candidate subsequence generation efforts, the performance is improved significantly. Experiments show that the proposed algorithm is much more efficient and scalable than the previous one. In mining real-life databases, our works enhance the model's practicability and could promote more applications in business.
机译:来自序列数据库的序列模式挖掘已被认为是各种应用程序中的重要数据挖掘问题。序列数据库中的项目可以根据分类法组织成概念层次结构。基于层次结构,不仅可以在层次结构的叶节点(单个项)上找到顺序模式,而且可以在层次结构的更高级别上找到顺序模式。这称为多级顺序模式挖掘。然而,在先前的研究中,基于任何两个脱节水平之间的清晰关系的分类法无法处理现实生活中的不确定性和模糊性。例如,西红柿可以分类为水果分类,但也可以视为蔬菜分类。为了处理分类法的模糊性,Chen和Huang开发了一种新颖的知识发现模型来挖掘模糊的多级顺序模式,其中一个级别到另一个级别的关系可以用0到1之间的值表示。开发了一种类似于通用顺序模式(GSP)的算法来查找模糊多级顺序模式。但是,该算法面临一个难题,因为挖掘过程可能必须生成和检查庞大的组合子序列集,并且需要对数据库进行多次扫描。在本文中,我们提出了一种基于分而治之策略的高效挖掘此类模式的新算法。另外,开发了另一种有效的算法来发现模糊的跨级别顺序模式。由于所提出的算法大大减少了候选子序列生成的工作量,因此性能得到了显着提高。实验表明,该算法比前一种算法具有更高的效率和可扩展性。在挖掘现实数据库中,我们的工作增强了该模型的实用性,并可以促进更多的业务应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号