首页> 外文会议> >On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping
【24h】

On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping

机译:基于K-均值和动态时间规整的多媒体时间序列数据聚类研究

获取原文

摘要

After the generation of multimedia data turned digital, an explosion of interest in their data storage, retrieval, and processing has drastically increased. This includes videos, images, and audios, where we now have higher expectations in exploiting these data at hands. Typical manipulations are in some forms of video/image/audio processing, including automatic speech recognition, which require fairly large amount of storage and are computationally intensive. In our recent work, we have demonstrated the utility of time series representation in the task of clustering multimedia data using k-medoids method, which allows considerable amount of reduction in computational effort and storage space. However, kmeans is a much more generic clustering method when Euclidean distance is used. In this work, we will demonstrate that unfortunately, k-means clustering will sometimes fail to give correct results, an unaware fact that may be overlooked by many researchers. This is especially the case when Dynamic Time Warping (DTW) is used as the distance measure in averaging the shape of time series. We also will demonstrate that the current averaging algorithm may not produce the real average of the time series, thus generates incorrect k-means clustering results, and then show potential causes why DTW averaging methods may not achieve meaningful clustering results. Lastly, we conclude with a suggestion of a method to potentially find the shape-based time series average that satisfies the required properties.
机译:在将多媒体数据转换为数字后,对其数据存储,检索和处理的兴趣激增。其中包括视频,图像和音频,我们现在对利用这些数据抱有更高的期望。典型的操作是以某些形式的视频/图像/音频处理,包括自动语音识别,这需要相当大的存储量并且计算量很大。在我们最近的工作中,我们已经证明了时间序列表示法在使用k-medoids方法对多媒体数据进行聚类的任务中的实用性,从而可以显着减少计算量和存储空间。但是,使用欧氏距离时,kmeans是一种更为通用的聚类方法。在这项工作中,我们将证明不幸的是,k均值聚类有时无法给出正确的结果,这一未知的事实可能被许多研究人员所忽略。当将动态时间规整(DTW)用作平均时间序列形状的距离度量时,尤其如此。我们还将证明当前的平均算法可能无法生成时间序列的实际平均值,从而生成不正确的k均值聚类结果,然后说明DTW平均方法可能无法获得有意义的聚类结果的潜在原因。最后,我们建议一种可能找到满足所需属性的基于形状的时间序列平均值的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号