首页> 外文期刊>International Journal of Data Warehousing and Mining >A New Similarity Metric for Sequential Data
【24h】

A New Similarity Metric for Sequential Data

机译:序列数据的新相似性度量

获取原文
获取原文并翻译 | 示例
       

摘要

In many data mining applications, both classification and clustering algorithms require a distance/similarity measure. The central problem in similarity based clustering/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this paper, the authors propose a similarity preserving Junction called Sequence and Set Similarity Measure (S~3M) that captures both the order of occurrence of items in sequences and the constituent items of sequences. The authors demonstrate the usefulness of the proposed measure for classification and clustering tasks. Experiments were conducted on benchmark datasets, that is, DARPA '98 and msnbc.for classification task in intrusion detection and clustering task in web mining domains. Results show the usefulness of the proposed measure.
机译:在许多数据挖掘应用中,分类算法和聚类算法都需要距离/相似性度量。基于相似度的包括顺序数据的聚类/分类的中心问题是确定适当的相似性度量。现有指标(如欧几里得,雅卡德,余弦等)并未明确利用数据的顺序性质。在本文中,作者提出了一种称为序列和集合相似性度量(S〜3M)的相似性保留连接点,它捕获序列中项的出现顺序和序列的组成项。作者证明了该建议措施对分类和聚类任务的有用性。针对基准数据集(即DARPA '98和msnbc。)进行了实验,以进行Web挖掘域中的入侵检测和聚类任务中的分类任务。结果表明了该措施的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号