首页> 外文会议>Uncertainty in artificial intelligence >Multidimensional counting grids: Inferring word order from disordered bags of words
【24h】

Multidimensional counting grids: Inferring word order from disordered bags of words

机译:多维计数网格:从无序的单词袋中推断单词顺序

获取原文
获取原文并翻译 | 示例

摘要

Models of bags of words typically assume topic mixing so that the words in a single bag come from a limited number of topics. We show here that many sets of bag of words ex hibit a very different pattern of variation than the patterns that are efficiently captured by topic mixing. In many cases, from one bag of words to the next, the words disappear and new ones appear as if the theme slowly and smoothly shifted across documents (pro viding that the documents are somehow or dered) . Examples of latent structure that de scribe such ordering are easily imagined. For example, the advancement of the date of the news stories is reflected in a smooth change over the theme of the day as certain evolving news stories fall out of favor and new events create new stories. Overlaps among the sto ries of consecutive days can be modeled by using windows over linearly arranged tight distributions over words. We show here that such strategy can be extended to multiple dimensions and cases where the ordering of data is not readily obvious. We demonstrate that this way of modeling covariation in word occurrences outperforms standard topic mod els in classification and prediction tasks in ap plications in biology, text modeling and com puter vision.
机译:单词袋模型通常假设主题混合,因此单个袋子中的单词来自有限数量的主题。我们在这里显示出,与主题混合有效捕获的模式相比,许多词组都表现出非常不同的变化模式。在许多情况下,从一个单词到另一个单词,单词消失了,出现了新单词,好像主题在文档中缓慢而顺畅地转移了(提供了文档是某种方式还是贬义了)。描述这种排序的潜在结构的示例很容易想到。例如,新闻故事日期的提前反映在当天主题的平稳变化上,因为某些不断发展的新闻故事不再受欢迎,新事件创造了新故事。连续几天的故事之间的重叠可以通过在单词上线性排列的紧密分布上使用窗口来建模。我们在这里表明,这种策略可以扩展到多个维度,并且数据的排序不容易显而易见。我们证明,在生物学,文本建模和计算机视觉应用中,这种在单词出现中协变量建模的方法优于分类和预测任务中的标准主题模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号