首页> 外文会议>European conference on machine learning and knowledge discovery in databases >A Layered Dirichlet Process for Hierarchical Segmentation of Sequential Grouped Data
【24h】

A Layered Dirichlet Process for Hierarchical Segmentation of Sequential Grouped Data

机译:分层分组数据的分层Dirichlet过程

获取原文

摘要

We address the problem of hierarchical segmentation of sequential grouped data, such as a collection of textual documents, and propose a Bayesian nonparametric approach for this problem. Existing Bayesian nonparametric models such as the sticky HDP-HMM are suitable only for single-layer segmentation. We propose the Layered Dirichlet Process (LaDP), where each layer has a countable set of Dirichlet Processes, draws from which define a distribution over the countable set of Dirichlet Processes at the next layer. Each data item gets assigned to a distribution (index) from each layer of the hierarchy, leading to hierarchical segmentation of the sequence. The complexity of inference depends upon the exchangeability assumptions for the measures at different layers. We propose a new notion of exchangeability called Block Exchangeability, which lies between Markov Exchangeability (used in HDP-HMM) and Complete Group Exchangeability (used in HDP), and allows for faster inference than Markov Exchangeability. Using experiments on a news transcript dataset and a product review dataset, we show that LaDP generalizes better than existing non-parametric models for sequential data, and by simultaneously segmenting at multiple levels, outperforms existing models in terms of single-layer segmentation. We also show empirically that using Block Exchangeability greatly speeds up inference and allows trading off accuracy for execution time.
机译:我们解决了顺序分组数据(例如文本文档集合)的分层分割问题,并针对此问题提出了一种贝叶斯非参数方法。现有的贝叶斯非参数模型(例如粘性HDP-HMM)仅适用于单层分割。我们提出了分层狄利克雷过程(LaDP),其中每一层都有一组可数的狄利克雷过程,并从中得出下一层上可计数的狄利克雷过程集的分布。每个数据项都从层次结构的每一层分配给一个分布(索引),从而导致序列的层次划分。推论的复杂性取决于不同层上度量的可交换性假设。我们提出了一种新的可交换性概念,称为“块可交换性”,它位于Markov可交换性(用于HDP-HMM)和Complete Group可交换性(用于HDP)之间,并且比Markov可交换性具有更快的推断能力。使用新闻记录数据集和产品评论数据集上的实验,我们显示LaDP的序列化数据比现有的非参数模型具有更好的泛化能力,并且通过在多个级别上同时进行分割,在单层细分方面优于现有模型。我们还根据经验表明,使用块交换能力可以极大地加快推理速度,并可以在执行时间上权衡准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号