首页> 外文会议>International conference on very large data bases >Matrix Profile IV: Using Weakly Labeled Time Series to Predict Outcomes
【24h】

Matrix Profile IV: Using Weakly Labeled Time Series to Predict Outcomes

机译:Matrix Profile IV:使用弱标记的时间序列预测结果

获取原文
获取外文期刊封面目录资料

摘要

In academic settings over the last decade, there has been significant progress in time series classification. However, much of this work makes assumptions that are simply unrealistic for deployed industrial applications. Examples of these unrealistic assumptions include the following: assuming that data subsequences have a single fixed-length, are precisely extracted from the data, and are correctly labeled according to their membership in a set of equal-size classes. In real-world industrial settings, these patterns can be of different lengths, the class annotations may only belong to a general region of the data, may contain errors, and finally, the class distribution is typically highly skewed. Can we learn from such weakly labeled data? In this work, we introduce SDTS, a scalable algorithm that can learn in such challenging settings. We demonstrate the utility of our ideas by learning from diverse datasets with millions of datapoints. As we shall demonstrate, our domain-agnostic parameter-free algorithm can be competitive with domain-specific algorithms used in neuroscience and entomology, even when those algorithms have been tuned by domain experts to incorporate domain knowledge.
机译:在过去的十年中,学术界在时间序列分类方面取得了重大进展。但是,许多工作做出的假设对于部署的工业应用来说根本不现实。这些不切实际的假设的示例包括以下内容:假设数据子序列具有单个固定长度,是从数据中精确提取的,并根据它们在一组相等大小的类中的成员资格正确进行了标记。在现实世界的工业环境中,这些模式的长度可能不同,类别注释可能仅属于数据的一般区域,可能包含错误,最后,类别分布通常会高度偏斜。我们可以从标记薄弱的数据中学习吗?在这项工作中,我们介绍了SDTS,这是一种可扩展的算法,可以在如此艰巨的环境中学习。我们通过从具有数百万个数据点的各种数据集中学习来证明我们的想法的实用性。正如我们将证明的那样,即使领域专家已经将这些算法与领域科学结合起来,但与领域无关的无参数算法也可以与神经科学和昆虫学中使用的领域特定算法相竞争。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号