首页> 外文会议>International Conference on Data Warehousing and Knowledge Discovery >Genetic Algorithms-Based Symbolic Aggregate Approximation
【24h】

Genetic Algorithms-Based Symbolic Aggregate Approximation

机译:基于遗传算法的符号聚合近似

获取原文

摘要

Time series data appear in a broad variety of economic, medical, and scientific applications. Because of their high dimensionality, time series data are managed by using representation methods. Symbolic representation has attracted particular attention because of the possibility it offers to benefit from algorithms and techniques of other fields in computer science. The symbolic aggregate approximation method (SAX) is one of the most important symbolic representation techniques of times series data. SAX is based on the assumption of "high Gaussianity" of normalized time series which permits it to use breakpoints obtained from Gaussian lookup tables. The use of these breakpoints is the heart of SAX. In this paper we show that this assumption of Gaussianity oversimplifies the problem and can result in very large errors in time series mining tasks. We present an alternative scheme, based on the genetic algorithms (GASAX), to find the breakpoints. The new scheme does not assume any particular distribution of the data, and it does not require normalizing the data either. We conduct experiments on different datasets and we show that the new scheme clearly outperforms the original scheme.
机译:时间序列数据出现在广泛的经济,医疗和科学应用中。由于其高维度,通过使用表示方法来管理时间序列数据。象征性表示引起了特别的注意,因为它提供从计算机科学中其他领域的算法和技术受益。符号聚合近似方法(SAX)是时序序列数据最重要的符号表示技术之一。 SAX基于归一化时间序列的“高高斯”的假设,这允许它使用从高斯查找表获得的断点。这些断点的使用是萨克斯的核心。在本文中,我们认为,高斯的这种假设过度简化了这个问题,并且可能导致时间级级挖掘任务中的非常大的错误。我们提出了一种基于遗传算法(Gasax)的替代方案,以找到断点。新方案不假设数据的任何特定分布,并且它不需要归一化数据。我们对不同数据集进行实验,我们表明新方案明显优于原始方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号