首页> 外文会议>International joint conference on artificial intelligence;IJCAI-11 >Unsupervised Learning of Patterns in Data Streams Using Compression and Edit Distance
【24h】

Unsupervised Learning of Patterns in Data Streams Using Compression and Edit Distance

机译:使用压缩和编辑距离在数据流中进行无监督模式学习

获取原文

摘要

Many unsupervised learning methods for recognising patterns in data streams are based on fixed length data sequences, which makes them unsuitable for applications where the data sequences are of variable length such as in speech recognition, behaviour recognition and text classification. In order to use these methods on variable length data sequences, a pre-processing step is required to manually segment the data and select the appropriate features, which is often not practical in real-world applications. In this paper we suggest an unsupervised learning method that handles variable length data sequences by identifying structure in the data stream using text compression and the edit distance between 'words'. We demonstrate that using this method we can automatically cluster unlabelled data in a data stream and perform segmentation. We evaluate the effectiveness of our proposed method using both fixed length and variable length benchmark datasets, comparing it to the Self-Organising Map in the first case. The results show a promising improvement over baseline recognition systems.
机译:用于识别数据流中模式的许多无监督学习方法都是基于固定长度的数据序列,这使其不适用于数据序列具有可变长度的应用,例如语音识别,行为识别和文本分类。为了在可变长度的数据序列上使用这些方法,需要一个预处理步骤来手动分割数据并选择适当的功能,这在实际应用中通常是不实际的。在本文中,我们提出了一种无监督学习方法,该方法通过使用文本压缩和“单词”之间的编辑距离来识别数据流中的结构来处理可变长度的数据序列。我们证明了使用此方法,我们可以自动将数据流中未标记的数据聚类并执行分段。我们使用固定长度和可变长度基准数据集评估了我们提出的方法的有效性,并将其与第一种情况下的自组织图进行了比较。结果表明,与基线识别系统相比,有希望的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号